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Metastatic Colorectal Cancer Signatures 

This invention was made at least in part with assistance from the United States 
Federal Government, under Grant No. U01 CA88130 from the National Institutes of Health. 
5 As a result, the government may have certain rights to this invention. 

BACKGROUND OF THE INVENTION 

Cancer of the colon and/or rectum (referred to as "colorectal cancer") is 
significant in Western populations, particularly in the United States. Cancers of the colon 
and rectum occur in both men and women, most commonly after the age of 50. Colorectal 

10 cancer is the second leading cancer killer in the United States, and the third most common 
cancer overall. This year, more than 50,000 Americans will die from colorectal cancer and 
approximately 131,600 new cases will be diagnosed. 

Mutations in tumor-suppressor genes, proto-oncogenes, and DNA repair 
genes are factors known to influence the development of tumorigenesis. For example, 

15 inactivating both alleles of the adenomatous polyposis coli (ARC) gene, a tumor suppressor 
gene, appears to be one of the earliest events in colorectal cancer, and may even be the 
initiating event. Other genes implicated in colorectal cancer include the MCC gene, the p53 
gene, the DCC (deleted in colorectal carcinoma) gene and other chromosome 18q genes, 
and genes in the TGF-p signaling pathway (for a review, see Molecular Biology of 

20 Colorectal Cancer, pp. 238-299, in Curr. Probl. Cancer, Sept/Oct 1997; see also Willams, 
Colorectal Cancer (1996); Kinsella & Schofield, Colorectal Cancer: A Scientific 
Perspective (1993); Colorectal Cancer: Molecular Mechanisms, Premalignant State and its 
Prevention Schmiegel & Scholmerich eds., 2000; Colorectal Cancer: New Aspects of 
Molecular Biology and Their Clinical Applications (Hanski et al., eds 2000); McArdle et 

25 al., Colorectal Cancer (2000); Wanebo, Colorectal Cancer (1993); Levin, The American 
Cancer Society: Colorectal Cancer (1999); Treatment of Hepatic Metastases of Colorectal 
Cancer (Nordlinger & Jaeck eds., 1993); Management of Colorectal Cancer (Dunitz et al., 
eds. 1998); Cancer: Principles and Practice of Oncology (Devita et al., eds. 2001); Surgical 
Oncology: Contemporary Principles and Practice (Kirby et al., eds. 2001); Offit, Clinical 

30 Cancer Genetics: Risk Counseling and Management (1997); Radioimmunotherapy of 

Cancer (Abrams & Fritzberg eds. 2000); Fleming, AJCC Cancer Staging Handbook (1998); 
Textbook of Radiation Oncology (Leibel & Phillips eds. 2000); and Clinical Oncology 
(Abeloff et al., eds. 2000). 
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As with all cancers, there are stages of disease progression, as well as 
expected survival rates for these different stages. The American Cancer Society reports that 
the 5-year relative survival rate is 90% for people whose colorectal cancer is treated in an 
early stage, before it has spread. But, only 37% of colorectal cancers are found at that early 
5 stage. Once the cancer has spread to nearby organs or lymph nodes, the 5 -year relative 
survival rate goes down to 65%. For people whose colorectal cancer has spread to distant 
parts of the body such as the liver or lungs, the 5-year relative survival rate is 9%. Thus, 
metastasis of the tumor to the liver lungs and regional lymph nodes are important prognostic 
factors (see, e.g., PET in Oncology: Basics and Clinical Application (Ruhlmann et al. eds. 
10 1999). 

Since tumor metastases is the principal cause of death for cancer patients, a 
better understanding of the various factors involved in this process, especially about the 
gene expression exhibited by these cancers, will have prognostic and diagnostic value. 
Indeed, patterns of gene expression associated with the various stages of these cancers 
1 5 would provide an important tool in the selection of treatment alternatives. 

Comparing the gene expression profiles of different cells and tissues can 
provide information about the identity of the tissue, the health status of the tissue and other 
properties. For example, genes that are differentially expressed in healthy and pathologic 
cells can function as diagnostic markers. Additionally, such genes are candidate targets for 
20 regulation by therapeutic intervention. 

There are numerous methods presently in use for generating gene expression 
profiles of a cell or tissue. However, there remains a need in the art for methods that utilize 
the information embodied in a gene expression profile for the benefit of diagnosing, treating 
or determining the probable prognosis of disease. 
25 Accordingly, provided herein are methods that can be used in diagnosis and 

prognosis evaluation of metastatic colorectal cancer. Further provided are methods that can 
be used to screen candidate therapeutic agents for the ability to modulate, e.g., treat, 
colorectal cancer. Additionally, provided herein are molecular targets and compositions for 
therapeutic intervention in metastatic colorectal disease and other metastatic cancers. 

30 

BRIEF SUMMARY OF THE INVENTION 

The present invention provides materials and methods for characterizing 
biological samples, thereby providing diagnostic methods for identifying cells and tissues 
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and evaluating their physiological status. The methods involve obtaining a biological 
sample, generating a gene expression profile of the biological sample, and comparing the 
gene expression profile of a select group of genes from the biological sample with gene 
expression profile represented by the reference sets of the Tables 1-6. 
5 The select groups of genes used for comparison, identification, and diagnosis 

of the health status of a biological sample comprise the reference sets of the Tables 1-6. 
The reference sets of the Tables 1-6 comprise genes selected for their high signal-to-noise 
ratio in reference samples. These genes, herein referred to as "classifier genes" provide 
maximum information regarding the nature and identity of a given biological sample. 

10 In one aspect the invention provides a method of diagnosing the health status 

of a biological sample comprising the steps of; generating a gene expression pattern of the 
biological sample, and comparing the gene expression pattern of the biological sample with 
the reference sets of the Tables 1-6, wherein a match between the gene expression pattern of 
one or more genes in the biological sample and one or more genes of the Tables 1-6 

15 provides a diagnosis of the biological sample. In one embodiment, the biological sample 
comprises cells obtained from a biopsy sample. In another embodiment, the biological 
sample is diagnosed as healthy tissue. Li yet another embodiment, the biological sample is 
diagnosed as having metastatic colorectal cancer. 

In one embodiment analysis of the gene expression pattern of the biological 

20 sample indicates that the colon cancer is likely to develop future metastasis. 

In one embodiment, the diagnosis of the biological sample is made with 
reference to at least five different classifier genes from Tables 1-6. 

In another embodiment, comparison of the gene expression pattern of the 
biological sample and the reference sets identifies the tissue origin of the metastatic cancer. 

25 In one embodiment, the comparison of the gene expression pattern of the 

biological sample and the reference sets is made by comparing RNA expression profiles. 

In another embodiment, the comparison of the gene expression pattern of the 
biological sample and the reference sets is made by comparing protein expression profiles. 
In one embodiment, the protein expression profile is evaluated using antibodies. 

30 In one aspect, the invention provides a method for prognosis evaluation of 

metastatic colorectal cancer comprising the steps of; generating a gene expression pattern of 
the biological sample, and comparing the gene expression pattern of the biological sample 
with the reference sets of the Tables 1-6, wherein a match between the gene expression 
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pattern of the biological sample and one or more reference sets provides a prognosis 
evaluation of the metastatic potential of the colorectal cancer. In one embodiment, a match 
between the gene expression pattern of the biological sample and the reference set 
representing colon cancer hepatic metastases is indicative of poor prognosis. 
5 In another aspect the invention provides a method for evaluating the progress 

of treatment of metastatic colorectal cancer comprising the steps of; generating a first gene 
expression pattern of a first biological sample from a patient, comparing the first gene 
expression pattern of the first biological sample with the reference sets of the Tables 1-6, 
obtaining a match between the first gene expression pattern of the first biological sample 

10 and one or more reference sets of the Tables 1-6, thereby providing an initial diagnosis of 
metastatic colorectal cancer, then administering to the patient a therapeutically effective 
amount of a compound that modulates the metastatic colorectal cancer, generating a second 
gene expression profile of a second biological sample from the patient, and comparing the 
second gene expression pattern of the second biological sample with the reference sets of 

15 the Tables 1-6, then comparing the match between the second gene expression pattern of the 
second biological sample and the match between the first gene expression pattern of the first 
biological sample wherein the comparison indicates the progress of the treatment for 
metastatic colorectal cancer. 

In another aspect, the invention provides a method for evaluating the efficacy 

20 of drug candidates for the treatment of metastatic colorectal cancer, comprising the steps of; 
contacting a cell or tissue culture that has a gene expression profile indicative of metastatic 
colorectal cancer with an effective amount of a test compound, generating a gene expression 
profile of the contacted cell or tissue culture, and comparing the gene expression pattern of 
the contacted cell culture with the defined sets of genes of the Tables 1-6, obtaining a match 

25 between the gene expression pattern of the contacted cell culture and thereby determining 
the efficacy of the drug compound for the treatment of metastatic colorectal cancer. 

In another aspect, the invention provides a kit for identifying the gene 
expression pattern of a biological sample comprising; nucleic acid probes that specifically 
bind to nucleotide sequences from reference sets of the Tables 1-6, and means of labeling 

30 nucleic acids. In one embodiment the kit comprises nucleic acid probes that identify 
metastatic cancer derived from a primary tumor in an organ selected from the group 
consisting of heart, lung, pancreas, breast, prostate, and colon. 
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In another aspect, the invention provides a kit for identifying the gene 
expression pattern of a biological sample comprising; antibodies or ligands that specifically 
bind to polypeptides encoded by a genes of the reference sets of the Tables 1-6, and means 
of labeling the antibodies or ligands that specifically bind to polypeptides encoded by genes 
5 of the reference sets of the Tables 1-6. In one aspect, the kit provides antibodies or ligands 
that identify metastatic cancer derived from a primary tumor in an organ selected from the 
group consisting of lung, pancreas, breast, prostate, and colon. 

DETAILED DESCRIPTION OF THE INVENTION 

Definitions 

10 By "metastatic colorectal cancer" herein is meant a colon and/or rectal 

tumor or cancer that is classified as Dukes stage C or D (see, e.g., Cohen et al, Cancer of 
the Colon, in Cancer: Principles and Practice of Oncology, pp. 1 144-1 197 (Devita et ah, 
eds., 5 th ed. 1997); see also Harrison 's Principles of Internal Medicine, pp. 1289-129 
(Wilson et al, eds., 12 th ed., 1991). "Treatment, monitoring, detection or modulation of 

15 metastatic colorectal cancer 95 includes treatment, monitoring, detection, or modulation of 

metastatic colorectal disease in those patients who have metastatic colorectal disease (Dukes 
stage C or D). hi Dukes stage A, the tumor has penetrated into, but not through, the bowel 
wall. In Dukes stage B, the tumor has penetrated through the bowel wall but there is not yet 
any lymph involvement, hi Dukes stage C, the cancer involves regional lymph nodes. In 

20 Dukes stage D, there is distant metastasis, e.g., liver, lung, etc. 

The term "metastasis" refers to the process by which a disease shifts from 
one part of the body to another. This process may include the spreading of neoplasms from 
the site of a primary tumor to distant parts of the body. 

The term "metastatic cancer" refers to any cancer in any part of the body 
25 which has its origins in primary cancer at a site distant from the location of the secondary 
tumor. Metastatic cancer includes, but is not limited to true "metastatic tumors" as well as 
pre-metastatic primary tumor cells in the process of developing a metastatic phenotype. 

The term "metastatic potential" refers to the like hood that a particular 
tumor will metastasize. A tumor with metastatic potential has a high likelihood of 
30 progressing to metastatic cancer. 
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The term "secondary tumor" refers to a metastatic tumor that has 
developed at a site distant from the location of the original, primary cancer. 

"Classifier genes" are genes selected for the purpose of comparison and 
identification of biological samples. Classifier genes are selected by virtue of the high 
5 signal-to-noise ratio and reproducibility they display when measured in reference samples. 
Classifier genes are considered "maximally informative genes" because the ability to clearly 
and reliably detect them provides maximum information regarding the nature and identity of 
a given biological sample. 

A specific classifier gene may or may not be uniquely expressed in a 
10 particular cell, tissue, or organ. In some applications, the classifier gene may be tissue- 
specific; that is, expressed exclusively in a particular tissue or cell type. In other 
applications the classifier gene may be expressed predominantly in one tissue type, but 
could also be expressed in other cells, tissues or organs, but in a different relationship with 
the other classifier genes of the set. Thus, the level of expression of a classifier gene, and its 
15 relationship within a pattern of co-expressed genes creates a unique profile that can be used 
to infer the identity and physiology of an unknown biological sample. 

Classifier genes may encode intracellular molecules, e.g., cellular nucleic 
acids, intracellular proteins, and the intracellular domains of transmembrane proteins, or 
extracellular molecules such as the extracellular domains of transmembrane proteins or 
20 secreted proteins. Intracellular and extracellular classifier molecules are equally suitable. 

The protein product of a classifier gene may be referred to herein as a. 
"classifier protein". Similarly, "classifier molecule" may be used herein to refer 
collectively to both classifier genes and classifier proteins. 

Subsets of classifier genes representative of the gene expression patterns of 
25 different cells, tissues, organs and physiological states of disease and health are organized 
into the reference sets of the Tables 1-6. 

The term "metastatic colorectal cancer classifier protein" or "metastatic 
colorectal cancer classifier polynucleotide" or "metastatic colorectal cancer classifier 
gene sequences" refers to nucleic acid and polypeptide polymorphic variants, alleles, 
30 mutants, and interspecies homologs that: (1) have a nucleotide sequence that has greater 

than about 60% nucleotide sequence identity, 65%, 70%, 75%, 80%, 85%, 90%, preferably 
91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% or greater nucleotide sequence 
identity, preferably over a region of over a region of at least about 25, 50, 100, 200, 500, 
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1000, or more nucleotides, to a nucleotide sequence of or associated with a UniGene cluster 
of Tables 1-6; (2) bind to antibodies, e.g., polyclonal antibodies, raised against an 
immunogen comprising an amino acid sequence encoded by a nucleotide sequence of or 
associated with a UniGene cluster of Tables 1-6, and conservatively modified variants 
5 thereof; (3) specifically hybridize under stringent hybridization conditions to a nucleic acid 
sequence, or the complement thereof of Tables 1-6 and conservatively modified variants 
thereof or (4) have an amino acid sequence that has greater than about 60% amino acid 
sequence identity, 65%, 70%, 75%, 80%, 85%, 90%, preferably 91%, 92%, 93%, 94%, 
95%, 96%, 97%, 98% or 99% or greater amino sequence identity, preferably over a region 

10 of over a region of at least about 25, 50, 100, 200, 500, 1000, or more amino acid, to an 
amino acid sequence encoded by a nucleotide sequence of or associated with a UniGene 
cluster of Tables 1-6. A polynucleotide or polypeptide sequence is typically from a 
mammal including, but not limited to, primate, e.g., human; rodent, e.g., rat, mouse, 
hamster; cow, pig, horse, sheep, or other mammal. A "metastatic colorectal cancer 

15 classifier gene sequence" a includes both naturally occurring or recombinant nucleotide and 
protein sequences. 

"Reference set 95 refers to defined sets of classifier genes that characterize a 
particular tissue, organ, cell, cell culture or physiological state of a biological sample. The 
reference set may form part of an organized hierarchical structure for the classification of 

20 individual tissues or organs. If the reference set is part of an organized hierarchical 

structure, it may be used to identify or distinguish a sample at either the highest or lowest 
level of classification, or it may contain defined sets of genes representing one or more 
levels of classification for a given tissue or organ and therefore use several levels 
simultaneously to identify a sample. 

25 Table 1 illustrates the hierarchical structure of classification that orders the 

defined sets of classifier genes comprising the reference sets of the invention. These 
defined sets of classifier genes can be used to characterize individual tissues and organs 
from humans. The defined sets of genes are organized hierarchically to permit 
identification of a sample on several levels of detail. For example, using the reference sets 

30 of classifier genes of Tables 1-6, it is possible to determine that a sample comprises adipose 
tissue. Within the context of this reference set that identifies adipose tissue, further analysis 
could reveal other defined sets of classifier genes which, when compared to the reference 
sets of classifier genes in Tables 1-6 identify the sample as being mammary tissue as 
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opposed to omental tissue or simple adipose tissue. The sample could be still further 
analyzed within the context of the reference set that characterizes adipose tissue, to 
determine that the sample is a sample of breast tissue. 

A "signature" refers to a specific pattern of gene expression as reflected in a 
5 particular defined set of classifier genes of the Tables 1-6. The "signature" of a biological 
sample is a unique identifier of the sample. 

A "tissue" refers to a complex, integrated group of cohesive, typically 
spatially aggregated cells; certain "tissues" are disperse, e.g., blood cells or skin that share a 
common structure and/or function. Alternatively, complex assemblies of tissues form 

10 functional systems of organs. See, e.g., Rohen, et al. (2002) Color Atlas of Anatomy: A 
Photographic Study of the Human Body Lippincott; Hiatt, et al. (2000) Color Atlas of 
Histology Lippincott. 

"Biological sample" refers to a sample derived from a virus, cell, tissue, 
organ, or organism including, without limitation, cell, tissue or organ lysates or 

15 homogenates, or body fluid samples, such as blood, urine, sputum, or cerebrospinal fluid. 
Such samples include, but are not limited to, tissue isolated from humans, or explants, 
primary, and transformed cell cultures derived therefrom. Biological samples may also 
include sections of tissues such as frozen sections taken for histologic purposes. A 
biological sample can be obtained from a eukaryotic organism such as fungi, plants, insects, 

20 protozoa, birds, fish, reptiles, and preferably a mammal such as rat, mouse, cow, dog, 

guinea pig, or rabbit, and most preferably a primate such as cynomologous monkeys, rhesus 
monkeys, chimpanzees, or humans. 

"Encoding" refers to the property of specific sequences of nucleotides in a 
polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of 

25 other polymers and macromolecules in biological processes having either a defined 

sequence of nucleotides (e.g., rRNA, tRNA, and mRNA) or a defined sequence of amino 
acids and the biological properties resulting therefrom. A gene encodes a protein if 
transcription and translation of mRNA produced by that gene produces the protein in a cell 
or other biological system. Both the coding strand, the nucleotide sequence of which is 

30 identical to the mRNA sequence and is usually provided in sequence listings, and non- 
coding strand, used as the template for transcription, of a gene or cDNA, can be referred to 
as encoding the protein or other product of that gene or cDNA. Unless otherwise specified, 
a "nucleotide sequence encoding an amino acid sequence" includes all nucleotide sequences 
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that are degenerate versions of each other and that encode the same amino acid sequence. 
Nucleotide sequences that encode proteins and RNA may include introns. See, e.g., Lodish, 
et al. (2000) Mol. Cell Biol. (4th ed.) Freeman; Alberts, et al. (1994) Mol. BioL Cell 
Garland. 

5 "Differential expression" or grammatical equivalents as used herein, refers 

to qualitative or quantitative differences in the temporal and/or cellular gene expression 
patterns within and among cells and tissue. Thus, a differentially expressed gene can 
qualitatively have its expression altered, including an activation or inactivation, in, e.g., 
normal versus metastatic colorectal cancer tissue. Genes may be turned on or turned off in a 

10 particular state, relative to another state thus permitting comparison of two or more states. 
A qualitatively regulated gene will exhibit an expression pattern within a state or cell type 
which is detectable by standard techniques. Some genes will be expressed in one state or 
cell type, but not in both. Alternatively, the difference in expression may be quantitative, 
e.g., in that expression is increased or decreased; i.e., gene expression is either upregulated, 

15 resulting in an increased amount of transcript, or downregulated, resulting in a decreased 
amount of transcript. The degree to which expression differs need only be large enough to 
quantify via standard characterization techniques as outlined below, such as by use of 
Affymetrix GeneChip™ expression arrays, Lockhart, Nature Biotechnology 14:1675-1680 
(1996), hereby expressly incorporated by reference. Other techniques include, but are not 

20 limited to, quantitative reverse transcriptase PGR, northern analysis and RNase protection. 

A component of a biological sample is differentially expressed between two 
samples if the difference in amount of the component in one sample vs. the amount in the 
other sample is statistically significant. For example, preferably the change in expression 
(i.e., upregulation or downregulation) is typically at least about 50%, more preferably at 

25 least about 100%, more preferably at least about 150%, more preferably at least 180%, 
200%, 300%, 500%, 700%, 900%, or 1000% the amount in the other sample, or if it is 
detectable in one sample and not detectable in the other. 

"Gene expression profile" refers to the identification of at least one mRNA 
or protein expressed in a biological sample. 

30 "Nucleic acid array" refers to an array of addressable locations (e.g., a 

location characterized by a distinctive, interrogatable address), each addressable location 
comprising a characteristic nucleic acid attached thereto. A nucleic acid as defined herein, 
may be a naturally occurring or synthetic nucleic acid, e.g., an oligonucleotide or 
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polynucleotide. Li an oligonucleotide array, the nucleic acid is an oligonucleotide (e.g., 
corresponding to an exon, EST, or a portion of a gene, transcript, or cDNA); in an EST 
array the nucleic acid is an EST or portion thereof; in an mRNA array the nucleic acid is an 
mRNA or portion thereof, or a corresponding cDNA. An oligonucleotide can be from 4, 6, 
5 8, 10, or 12 nucleotides or longer in length, often 10, 30, 40, or 50 nucleotides in length, up 
to about 100 nucleotides in length. See Kohane, et al. (2002) Microarrays for Integrative 
Genomics MIT Press; Baldi and Hatfield (2002) DNA Microarrays and Gene Expression 
Cambridge Univ. Press. 

"Detect" refers to identifying the presence, absence or amount of the object 

10 to be detected. "Detectable moiety" or a "label" refers to a composition detectable by 
spectroscopic, photochemical, biochemical, immunochemical, or chemical means. For 
example, useful labels include 32 P, 35 S, fluorescent dyes, electron-dense reagents, enzymes 
(e.g., as commonly used in an ELISA), biotin-streptavidin, digoxigenin, haptens and 
proteins for which antisera or monoclonal antibodies are available, or nucleic acid 

15 molecules with a sequence complementary to a target. The detectable moiety often 

generates a measurable signal, such as a radioactive, chromogenic, or fluorescent signal, 
that can be used to quantify the amount of bound detectable moiety in a sample. 
Quantitation of the signal is achieved by, e.g., scintillation counting, densitometry, or flow 
cytometry. 

20 As used herein a "nucleic acid probe or oligonucleotide" is defined as a 

nucleic acid capable of binding to a target nucleic acid of complementary sequence through 
one or more types of chemical bonds, usually through complementary base pairing, usually 
through hydrogen bond formation. As used herein, a probe may include natural (e.g., A, G, 
C, or T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in a 

25 probe may be joined by a linkage other than a phosphodiester bond, so long as it does not 
interfere with hybridization. Thus, for example, probes may be peptide nucleic acids in 
which the constituent bases are joined by peptide bonds rather than phosphodiester linkages. 
It will be understood by one of skill in the art that probes may bind target sequences lacking 
complete complementarity with the probe sequence depending upon the stringency of the 

30 hybridization conditions. The probes are preferably directly labeled as with isotopes, 

chromophores, lumiphores, chromogens, or indirectly labeled such as with biotin to which a 
streptavidin complex may later bind. By assaying for the presence or absence of the probe, 
one can detect the presence or absence of the select sequence or subsequence. 
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A "labeled nucleic acid probe or oligonucleotide" is one that is bound, 
either covalently, through a linker or a chemical bond, or noncovalently, through ionic, van 
der Waals, electrostatic, or hydrogen bonds to a label such that the presence of the probe 
may be detected by detecting the presence of the label bound to the probe. "Antibody" 
5 refers to a polypeptide comprising a framework region from an immunoglobulin gene or 
fragments thereof that specifically binds and recognizes an antigen. The recognized 
immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon, and mu 
constant region genes, as well as the myriad immunoglobulin variable region genes. Light 
chains are classified as either kappa or lambda. Heavy chains are classified as gamma, mu, 

10 alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, IgM, IgA, 
IgD and IgE, respectively. See Paul (1999) Fundamental Immunology (4th ed.) Raven. 

An exemplary immunoglobulin (antibody) structural unit comprises a 
tetramer. Each tetramer is composed of two identical pairs of polypeptide chains, each pair 
having one "light" (about 25 kD) and one "heavy" chain (about 50-70 kD). The N-terminus 

15 of each chain defines a variable region of about 100 to 1 10 or more amino acids primarily 
responsible for antigen recognition. The terms variable light chain (Vl) and variable heavy 
chain (Vh) refer to these light and heavy chains respectively. 

Antibodies exist, e.g., as intact immunoglobulins or as a number of well- 
characterized fragments produced by digestion with various peptidases. Thus, for example, 

20 pepsin digests an antibody below the disulfide linkages in the hinge region to produce 

F(ab) 9 2, a dimer of Fab which itself is a light chain joined to Vr-ChI by a disulfide bond. 

* 

The F(ab) 5 2 may be reduced under mild conditions to break the disulfide linkage in the 
hinge region, thereby converting the F(ab) 5 2 dimer into an Fab 5 monomer. The Fab 5 
monomer is essentially Fab with part of the hinge region (see Fundamental Immunology 

25 (Paul ed., 4th ed. 1999)). While various antibody fragments are defined in terms of the 
digestion of an intact antibody, one of skill will appreciate that such fragments may be 
synthesized de novo either chemically or by using recombinant DNA methodology. Thus, 
the term antibody, as used herein, also includes antibody fragments either produced by the 
modification of whole antibodies, or those synthesized de novo using recombinant DNA 

30 methodologies (e.g., single chain Fv, diabodies [dimers of scFv], minibodies [scFv-C H 3 
fusion proteins]) or those identified using phage display libraries (see, e.g., McCafferty et 
al 9 Nature 348:552-554 (1990)). 
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Monoclonal or polyclonal antibodies my be prepared by many techniques. 
See, e.g., Kohler & Milstein, Nature 256:495-497 (1975); Kozbor et al, Immunology Today 
4: 72 (1983); Cole et al, pp. 77-96 in Monoclonal Antibodies and Cancer Therapy, Alan R. 
Liss, Inc. (1985). Techniques for the production of single chain antibodies (U.S. Patent 
5 4,946,778) can be adapted to produce antibodies to polypeptides of this invention. Also, 
transgenic mice, or other organisms such as other mammals, may be used to express 
humanized antibodies. Alternatively, phage display technology can be used to identify 
antibodies and heteromeric Fab fragments that specifically bind to selected antigens. See, 
e.g., McCafferty et al, Nature 348:552-554 (1990); Marks et al, Biotechnology 10:779-783 
10 (1992). 

A "chimeric antibody" is an antibody molecule in which (a) the constant 
region, or a portion thereof, is altered, replaced or exchanged so that the antigen binding site 
(variable region) is linked to a constant region of a different or altered class, effector 
function and/or species, or an entirely different molecule which confers new properties to 

15 the chimeric antibody, e.g., an enzyme, toxin, hormone, growth factor, drug, etc.; or (b) the 
variable region, or a portion thereof, is altered, replaced or exchanged with a variable region 
having a different or altered antigen specificity. 

The term £fi inimiiiioasgay 5? is an assay that uses an antibody to specifically 
bind an antigen. The immunoassay is characterized by the use of specific binding properties 

20 of a particular antibody to isolate, target and/or quantify the antigen. See Coligan, et al. 
(1993 and supplements) Current Protocols in Immunology Wiley. 

When used in the context of an antibody-antigen reaction, "specific" or 
"selective binding" of an antibody refers to a binding reaction that is determinative of the 
presence of the antigen in a heterogeneous population of proteins and other biologies. Thus, 

25 under designated immunoassay conditions, the specified antibodies bind to a particular 
protein at least two times the background and do not substantially bind in a significant 
amount to other proteins present in the sample. Specific binding to an antibody under such 
conditions may require an antibody that is selected for its specificity for a particular protein. 
For example, polyclonal antibodies raised to a polypeptide encoded by a polynucleotide of 

30 Tables 2-5, or splice variants, or portions thereof, can be selected to obtain only those 
polyclonal antibodies that are specifically immunoreactive with the selected polypeptide 
and not with other proteins. Where the target protein is a member of a family such as 
GPCRs, this selection may be achieved by subtracting out antibodies that cross-react with 
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molecules such as other GPCR family members. In addition, polyclonal antibodies raised to 
target polymorphic variants, alleles, orthologs, and conservatively modified variants can be 
selected to obtain only those antibodies that recognize the target protein, but not other 
GPCR family members. In addition, antibodies reactive to human target proteins but not 
5 homologs from other species can be selected in the same manner. A variety of 

immunoassay formats may be used to select antibodies specifically immunoreactive with a 
particular protein. For example, solid-phase ELIS A immunoassays are routinely used to 
select antibodies specifically immunoreactive with a protein {see, e.g., Harlow and Lane, 
Using Antibodies: A Laboratory Manual, New York: Cold Spring Harbor Laboratory Press 

10 (1998). for a description of immunoassay formats and conditions that can be used to 
determine specific immunoreactivity). 

The terms "isolated," "purified," or "biologically pure" refer to material 
that is substantially or essentially free from components that normally accompany it as 
found in its native state. Purity and homogeneity are typically determined using analytical 

15 chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid 
chromatography. A protein that is the predominant species present in a preparation is 
substantially purified. In particular, an isolated nucleic acid of Tables 2-6 encoding a 
polypeptide is separated from open reading frames that flank the polypeptide coding 
sequence gene and encode proteins other than the polypeptide of interest. The term 

20 "purified" denotes that a nucleic acid or protein gives rise to essentially one band in an 
electrophoretic gel. Particularly, it means that the nucleic acid or protein is at least 85% 
pure, more preferably at least 95% pure, and most preferably at least 99% pure. See, e.g., 
Walsh (2002) Proteins: Biochemistry and Biotechnology Wiley; Hardin, et al. (eds. 2001) 
Cloning, Gene Expression and Protein Purification Oxford Univ. Press; Wilson, et al. (eds. 

25 2000) Encyclopedia of Separation Science Academic Press. 

"Nucleic acid" refers to deoxyribonucleotides or ribonucleotides and 
polymers thereof in either single- or double-stranded form. The term encompasses nucleic 
acids containing known nucleotide analogs or modified backbone residues or linkages, 
which are synthetic, naturally occurring, and non-naturally occurring, which have similar 

30 binding properties as the reference nucleic acid, and which are metabolized in a manner 

similar to the reference nucleotides. Examples of such analogs include, without limitation, 
phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2- 
O-methyl ribonucleotides, peptide-nucleic acids (PNAs). 
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Unless otherwise indicated, a particular nucleic acid sequence also implicitly 
encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) 
and complementary sequences, as well as the sequence explicitly indicated. Specifically, 
degenerate codon substitutions may be achieved by generating sequences in which the third 
5 position of one or more selected (or all) codons is substituted with mixed-base and/or 
deoxyinosine residues (Batzer et al. 9 Nucleic Acid Res. 19:5081 (1991); Ohtsuka et al. 9 J. 
Biol Chem. 260:2605-2608 (1985); Rossolini et ah, MoL Cell. Probes 8:91-98 (1994)). 
The term nucleic acid is used interchangeably with gene, cDNA, mRNA, oligonucleotide, 
and polynucleotide. 

10 A particular nucleic acid sequence also implicitly encompasses "splice 

variants." Similarly, a particular protein encoded by a nucleic acid implicitly encompasses 
any protein encoded by a splice variant of that nucleic acid. "Splice variants/' as the name 
suggests, are products of alternative splicing of a gene. After transcription, an initial nucleic 
acid transcript may be spliced such that different (alternate) nucleic acid splice products 

15 encode different polypeptides. Mechanisms for the production of splice variants vary, but 
include alternate splicing of exons. Alternate polypeptides derived from the same nucleic 
acid by read-through transcription are also encompassed by this definition. Products of a 
splicing reaction, including recombinant forms of the splice products, are included in this 
definition. 

20 The terms "polypeptide/ 9 "peptide 55 and "protein" are used 

interchangeably herein to refer to a polymer of amino acid residues. The terms apply to 
amino acid polymers in which one or more amino acid residue is an artificial chemical 
mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring 
amino acid polymers and non-naturally occurring amino acid polymers. 

25 The term "amino acid" refers to naturally occurring and synthetic amino 

acids, as well as amino acid analogs and amino acid mimetics that function in a manner 
similar to the naturally occurring amino acids. Naturally occurring amino acids are those 
encoded by the genetic code, as well as those amino acids that are later modified, e.g., 
hydroxyproline, y-carboxyglutamate, and O-pho spho s erine . Amino acid analog refers to 

30 compounds that have the same basic chemical structure as a naturally occurring amino acid, 
i.e., a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R 
group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. 
Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but 
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retain the same basic chemical structure as a naturally occurring amino acid. Amino acid 
mimetics refers to chemical compounds that have a structure that is different from the 
general chemical structure of an amino acid, but that functions in a manner similar to a 
naturally occurring amino acid. 
5 Amino acids may be referred to herein by either their commonly known 

three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB 
Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their 
commonly accepted single-letter codes. 

"Conservatively modified variants" applies to both amino acid and nucleic 

10 acid sequences. With respect to particular nucleic acid sequences, conservatively modified 
variants refers to those nucleic acids which encode identical or essentially identical amino 
acid sequences, or where the nucleic acid does not encode an amino acid sequence, to 
essentially identical sequences. Because of the degeneracy of the genetic code, a large 
number of functionally identical nucleic acids encode any given protein. For instance, the 

15 codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every 
position where an alanine is specified by a codon, the codon can be altered to any of the 
corresponding codons described without altering the encoded polypeptide. Such nucleic 
acid variations are "silent variations," which are one species of conservatively modified 
variations. Every nucleic acid sequence herein which encodes a polypeptide also describes 

20 every possible silent variation of the nucleic acid. One of skill will recognize that each 
codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine, 
and TGG, which is ordinarily the only codon for tryptophan) can be modified to yield a 
functionally identical molecule. Accordingly, each silent variation of a nucleic acid which 
encodes a polypeptide is implicit in each described sequence. 

25 As to amino acid sequences, one of skill will recognize that individual 

substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein 
sequence which alters, adds or deletes a single amino acid or a small percentage of amino 
acids in the encoded sequence is a "conservatively modified variant" where the alteration 
results in the substitution of an amino acid with a chemically similar amino acid. 

30 Conservative substitution tables providing functionally similar amino acids are well known 
in the art. Such conservatively modified variants are in addition to and do not exclude 
polymorphic variants, interspecies homologs, and alleles of the invention. 
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The following eight groups each contain amino acids that are conservative 
substitutions for one another: Alanine (A), Glycine (G); Aspartic acid (D), Glutamic acid 
(E); Asparagine (N), Glutamine (Q); Arginine (R), Lysine (K); Isoleucine (I), Leucine (L), 
Methionine (M), Valine (V); Phenylalanine (F), Tyrosine (Y), Tryptophan (W); Serine (S), 
5 Threonine (T); and Cysteine (C), Methionine (M). See, e.g., Creighton, Proteins (1984) 
Freeman). 

The term "recombinant" when used with reference, e.g., to a cell, or nucleic 
acid, protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been 
modified by the introduction of a heterologous nucleic acid or protein or the alteration of a 

10 native nucleic acid or protein, or that the cell is derived from a cell so modified. Thus, for 
example, recombinant cells express genes that are not found within the native (non- 
recombinant) form of the cell or express native genes that are otherwise abnormally 
expressed, under expressed or not expressed at all. See Ausubel (ed. 1993) Current 
Protocols in Molecular Biology Wiley. 

15 A "promoter 95 is defined as an array of nucleic acid control sequences that 

direct transcription of a nucleic acid. As used herein, a promoter includes necessary nucleic 
acid sequences near the start site of transcription, such as, in the case of a polymerase II 
type promoter, a TATA element. A promoter also optionally includes distal enhancer or 
repressor elements, which can be located as much as several thousand base pairs from the 

20 start site of transcription. A "constitutive" promoter is a promoter that is active under most 
environmental and developmental conditions. An "inducible 95 promoter is a promoter that is 
active under environmental or developmental regulation. The term "operably linked 5 ' refers 
to a functional linkage between a nucleic acid expression control sequence (such as a 
promoter, or array of transcription factor binding sites) and a second nucleic acid sequence, 

25 wherein the expression control sequence directs transcription of the nucleic acid 

corresponding to the second sequence. See, e.g., Lodish, et al. (2000) Mol. Cell Biol. (4th 
ed.) Freeman; Alberts, et al. (1994) Mol. Biol. Cell Garland. 

The term "heterologous" when used with reference to portions of a nucleic 
acid indicates that the nucleic acid comprises two or more subsequences that are not found 

30 in the same relationship to each other in nature. For instance, the nucleic acid is typically 
recombinantly produced, having two or more sequences from unrelated genes arranged to 
make a new functional nucleic acid, e.g., a promoter from one source and a coding region 
from another source. Similarly, a heterologous protein indicates that the protein comprises 
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two or more subsequences that are not found in the same relationship to each other in nature 
(e.g., a fusion protein). 

An "expression vector" is a nucleic acid construct, generated recombinantly 
or synthetically, with a series of specified nucleic acid elements that permit transcription of 
5 a particular nucleic acid in a host cell. The expression vector can be part of a plasmid, 
virus, or nucleic acid fragment. Typically, the expression vector includes a nucleic acid to 
be transcribed operably linked to a promoter. 

The term "identify" in the context of the invention means to be able to 
recognize a particular gene expression pattern as being characteristic of a particular cell, 
10 tissue, organ, physiological state, or in the case of testing for compatibility of transplant 
donors and recipients the gene expression pattern may be characteristic of a particular 
individual. 

The terms "identical" or percent "identity/' in the context of two or more 
nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that 

1 5 are the same or have a specified percentage of amino acid residues or nucleotides that are 
the same (i.e., 60% identity, 65%, 70%, 75%, 80%, preferably 85%, 90%, 91%, 92%, 93%, 
94%, 95%o, 96%, 97%, 98%, 99% or higher identity to a nucleotide sequence such as those 
of Tables 2-5, or to an amino acid sequence encoded by a polynucleotide of Tables 2-5, 
when compared and aligned for maximum correspondence over a comparison window, or 

20 designated region as measured using one of the following sequence comparison algorithms 
or by manual alignment and visual inspection. Such sequences are then said to be 
"substantially identical. 5 ' This definition also refers to the compliment of a test sequence. 
Preferably, the identity exists over a region that is at least about 25 amino acids or 
nucleotides in length, or more preferably over a region that is 50-100 amino acids or 

25 nucleotides in length or larger, e.g., 200-500 or more. See, e.g., Baxevanis, et al. (2001) 
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins Wiley; Mount 

(2000) Bioinformatics: Sequence and Genome Analysis CSH Press; Ewens and Grant 

(2001) Statistical Methods in Bioinformatics: An Introduction Springer- Verlag; Sensen (ed. 
2002) Essentials of Genomics and Bioinformatics Wiley. 

30 For sequence comparison, typically one sequence acts as a reference 

sequence, to which test sequences are compared. When using a sequence comparison 
algorithm, test and reference sequences are entered into a computer, subsequence 
coordinates are designated, if necessary, and sequence algorithm program parameters are 
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designated. Default program parameters can be used, or alternative parameters can be 
designated. The sequence comparison algorithm then calculates the percent sequence 
identities for the test sequences relative to the reference sequence, based on the program 
parameters. For sequence comparison of nucleic acids and proteins, the BLAST and 
5 BLAST 2.0 algorithms and the default parameters discussed below are used. 

A "comparison window", as used herein, includes reference to a segment of 
any one of the number of contiguous positions selected from the group consisting of from 
20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a 
sequence may be compared to a reference sequence of the same number of contiguous 

1 0 positions after the two sequences are optimally aligned. Methods of alignment of sequences 
for comparison are well-known in the art. Optimal alignment of sequences for comparison 
can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl 
Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. 
Mol. Biol 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. 

15 Natl. Acad. ScL USA 85:2444 (1988), by computerized implementations of these 

algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software 
Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by manual 
alignment and visual inspection (see, e.g., Current Protocols in Molecular Biology 
(Ausubel et aL, eds. 2001 supplement)). 

20 A preferred example of an algorithm that is suitable for determining percent 

sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, 
which are described in Altschul et ah, Nuc. Acids Res. 25:3389-3402 (1977) and Altschul et 
al, J. Mol. Biol. 215:403-410 (1990), respectively. BLAST and BLAST 2.0 are used, with 
the parameters described herein, to determine percent sequence identity for the nucleic acids 

25 and proteins of the invention. Software for performing BLAST analyses is publicly 
available through the National Center for Biotechnology Information 
(http://www.ncbi.nlm.nih.gov/). This algorithm involves first identifying high scoring 
sequence pairs (HSPs) by identifying short words of length W in the query sequence, which 
either match or satisfy some positive-valued threshold score T when aligned with a word of 

30 the same length in a database sequence. T is referred to as the neighborhood word score 
threshold (Altschul et al., supra). These initial neighborhood word hits act as seeds for 
initiating searches to find longer HSPs containing them. The word hits are extended in both 
directions along each sequence for as far as the cumulative alignment score can be 
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increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters 
M (reward score for a pair of matching residues; always > 0) and N (penalty score for 
mismatching residues; always < 0). For amino acid sequences, a scoring matrix is used to 
calculate the cumulative score. Extension of the word hits in each direction are halted 
when: the cumulative alignment score falls off by the quantity X from its maximum 
achieved value; the cumulative score goes to zero or below, due to the accumulation of one 
or more negative-scoring residue alignments; or the end of either sequence is reached. The 
BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the 
alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength 
(W) of 1 1, an expectation (E) of 10, M=5, N=-4 and a comparison of both strands. For 
amino acid sequences, the BLASTP program uses as defaults a wordlength of 3, and 
expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. 
Natl Acad, ScL USA 89:10915 (1989)) alignments (B) of 50, expectation (E) of 10, M=5, 
N=-4, and a comparison of both strands. 

The BLAST algorithm also performs a statistical analysis of the similarity 
between two sequences (see, e.g., Karlin & Altschul, Proc. Nat 'I Acad. ScL USA 90:5873- 
5787 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest 
sum probability (P(N)), which provides an indication of the probability by which a match 
between two nucleotide or amino acid sequences would occur by chance. For example, a 
nucleic acid is considered similar to a reference sequence if the smallest sum probability in 
a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, 
more preferably less than about 0.01, and most preferably less than about 0.001. 

An indication that two nucleic acid sequences or polypeptides are 
substantially identical is that the polypeptide encoded by the first nucleic acid is 
immunologically cross reactive with the antibodies raised against the polypeptide encoded 
by the second nucleic acid, as described below. Thus, a polypeptide is typically 
substantially identical to a second polypeptide, for example, where the two peptides differ 
only by conservative substitutions. Another indication that two nucleic acid sequences are 
substantially identical is that the two molecules or their complements hybridize to each 
other under stringent conditions, as described below. Yet another indication that two 
nucleic acid sequences are substantially identical is that the same primers can be used to 
amplify the sequence. 
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The phrase "selectively (or specifically) hybridizes to" refers to the 
binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence 
under stringent hybridization conditions when that sequence is present in a complex mixture 
(e.g., total cellular or library DNA or RNA). See, e.g., Andersen (1998) Nucleic Acid 
5 Hybridization Springer- Verlag; Ross (ed. 1997) Nucleic Acid Hybridization Wiley. 

The phrase "stringent hybridization conditions" refers to conditions under 
which a probe will hybridize to its target subsequence, typically in a complex mixture of 
nucleic acid, but to no other sequences. Stringent conditions are sequence-dependent and 
will be different in different circumstances. Longer sequences hybridize specifically at 

10 higher temperatures. An extensive guide to the hybridization of nucleic acids is found in 
Tijssen, Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic 
Probes, "Overview of principles of hybridization and the strategy of nucleic acid assays" 
(1993). Generally, stringent conditions are selected to be about 5-10°C lower than the 
thermal melting point (T m ) for the specific sequence at a defined ionic strength pH. The T m 

15 is the temperature (under defined ionic strength, pH, and nucleic concentration) at which 
50% of the probes complementary to the target hybridize to the target sequence at 
equilibrium (as the target sequences are present in excess, at T m , 50% of the probes are 
occupied at equilibrium). Stringent conditions will be those in which the salt concentration 
is less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion concentration 

20 (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes 
(e.g., 10 to 50 nucleotides) and at least about 60°C for long probes (e.g., greater than 50 
nucleotides). Stringent conditions may also be achieved with the addition of destabilizing 
agents such as formamide. For high stringency hybridization, a positive signal is at least 
two times background, preferably 10 times background hybridization. Exemplary high 

25 stringency or stringent hybridization conditions include: 50% formamide, 5x SSC and 1% 
SDS incubated at 42° C or 5x SSC and 1% SDS incubated at 65° C, with a wash in 0.2x 
SSC and 0.1% SDS at 65° C. For PGR, a temperature of about 36°C is typical for low 
stringency amplification, although annealing temperatures may vary between about 32°C 
and 48°C depending on primer length. For high stringency PGR amplification, a 

30 temperature of about 62°C is typical, although high stringency annealing temperatures can 
range from about 50-65°C, depending on the primer length and specificity. Typical cycle 
conditions for both high and low stringency amplifications include a denaturation phase of 
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90-95 °C for 30-120 see, an annealing phase lasting 30-120 sec, and an extension phase of 
about 72°C for 1-2 min. 

Nucleic acids that do not hybridize to each other under stringent conditions 
are still substantially identical if the polypeptides that they encode are substantially 
5 identical. This occurs, for example, when a copy of a nucleic acid is created using the 

maximum codon degeneracy permitted by the genetic code. In such cases, the nucleic acids 
typically hybridize under moderately stringent hybridization conditions. Exemplary 
"moderately stringent hybridization conditions" include a hybridization in a buffer of 40% 
formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in IX SSC at 45°C. A positive 
10 hybridization is at least twice background. Those of ordinary skill will readily recognize 
that alternative hybridization and wash conditions can be utilized to provide conditions of 
similar stringency. 

Introduction 

15 In accordance with the objects outlined above, the present invention provides 

materials and methods for characterizing the nature of biological samples, thereby 
permitting one to identify a biological sample and/or evaluate its physiological state. In 
particular, the invention provides novel methods for diagnosis and treatment of colon and/or 
rectal cancer (e.g., colorectal cancer), including metastatic colorectal cancers, as well as 

20 methods for screening for compositions which modulate colorectal cancer. The method is 
also useful for differentiating between particular stages of cancer, for example Duke's stage 
A, B, C, or D colorectal cancers. The method is also effective for determining the origin of 
metastatic cancer. 

The methods of the present invention allow one to compare a set of genes 
25 expressed in a biological sample with reference set, and to thereby identify a cell culture, 
tissue or organ from which a biological sample is derived. Alternatively, the comparison 
may yield information useful for diagnosing the health status of tissue or organ sample. In 
some embodiments the invention is permits the prognosis evaluation of a patient with 
cancer, particularly colorectal cancer. In other embodiments the invention provides a 
30 method for monitoring the progress of therapeutic intervention to cure metastatic colorectal 
cancer. 

The invention comprises reference sets of classifier genes whose 
characteristic patterns of expression can be used to determine the physiological state of a 
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biological sample. The genes comprising the reference sets are selected for their high signal 
to noise ratio in a reference sample. These genes are considered "maximally informative 
genes" or "classifier genes". Any particular classifier gene of a reference set may or may 
not be uniquely expressed in a particular biological sample. However, the level of 
5 expression of such a gene, and its relationship within a pattern of co-expressed genes creates 
a unique profile that can be used to infer the identity and/or physiology of a biological 
sample. Reference sets, representing the gene expression pattern characteristic of metastatic 
tumors or tumors with metastatic potential are shown in the Tables 1-6. The genes 
indicative of a tumor with metastatic potential, may be either up-regulated or down- 
10 regulated with respect to samples from tumor or tissue that does not show metastatic 
potential. 

Classifier genes may be a portion of a larger polynucleotide comprising a 
polynucleotide as shown in the Tables 1-6 (e.g., a full length mRNA or cDNA). 
Alternatively classifier genes may be a portion of a polypeptide encoded by a larger 

15 polynucleotide comprising a polynucleotide as shown in the Tables 1-6. "Genes" in this 
context includes coding regions, non-coding regions, and mixtures of coding and non- 
coding regions. Accordingly, as will be appreciated by those in the art, using the sequences 
provided herein, extended sequences, in either direction, of the metastatic colorectal cancer 
genes can be obtained, using techniques well known in the art for cloning either longer 

20 sequences or the full length sequences; see Current Protocols in Molecular Biology 

(Ausubel et aL, eds., 1994). Selection of an appropriate portion of a polynucleotide for 
sequence hybridization, or of an appropriate portion of a polypeptide for immunological or 
other recognition, is dictated by optimal hybridization or immunogenicity and may be 
accomplished by the methods described herein e.g. microarray techniques. 

25 Selection of the classifier polynucleotide or polypeptide is in accordance 

with the particular analysis to which the biological sample will be subjected. A general 
property of classifier genes and their corresponding polypeptides is that expression of 
defined sets of classifier genes can be compared with the reference sets of the Tables 1-6 to 
determine the metastatic potential of a biological sample. In some applications, it is 

30 desirable for the classifier gene to be tissue-specific or disease -specific that is, expressed 
exclusively in the tissue, cells or disease of interest. In other applications, the classifier 
gene may be expressed predominantly in one tissue type, or disease state, but could also be 
expressed in other tissues, or in a healthy state, but in a different relationship with the other 
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classifier genes of the set. For example, a particular classifier gene may be expressed at 
different levels in biological sample comprising a colon liver metastasis, compared to a non- 
metastatic colon cancer (e.g. Duke ? s stage B colorectal cancer that was cured by surgery). 

Classifier genes may encode either intracellular molecules e.g., cellular 
5 nucleic acids, intracellular proteins, and the intracellular domains of transmembrane 
proteins, or may encode extracellular molecules, such as the extracellular domains of 
transmembrane proteins. Intracellular and extracellular classifier genes are equally suitable. 

Protein expression patterns may be evaluated by methods other than 
hybridization or antibody based detection. For example: chromatographic separation of 

10 proteins; ELISA or Ab based separations; affinity chromatography, 2d gels; general protein 
separation methods with analysis of individual "classifier" proteins all may be used 
(Padzikill (2002) Proteomics Kluwer; Liebler (2001) Introduction to Proteomics: Tools for 
the New Biology Humana; Suhai (ed. 2000) Genomics and Proteomics: Functional and 
Computational Aspects Kluwer; Rabilloud (ed. 2001) Proteome Research: Two 

15 Dimensional Gel Electrophoresis and Detection Methods Springer- Verlag; Hames and 

Rickwood (eds. 2001) Gel Electrophoresis of Proteins: A Practical Approach Oxford Univ. 
Press; James (ed. 2000) Proteome Research: Mass Spectrometry Springer- Verlag; 
Kyriakidis, et al. (eds. 2001) Proteome and Protein Analysis Springer- Verlag.) 

20 Gene Expression Profiling 

A first step in the methods of the invention is performing gene expression 
profiling of a sample of interest. Gene expression profiling refers to examining expression 
of one or more RNAs or proteins in a cell or tissue. Often at least or up to 10, 100, 1000, 
10,000 or more different RNAs or proteins are examined in a single experiment. The 

25 profile of the sample is the compared with the reference sets of the Tables 1-6. In some 
embodiments, a given classifier gene may have a similar expression pattern in different 
cells, hi other embodiments, the gene of interest may have lower or higher expression in 
one cell, tissue, organ or physiological state as compared to another. 

The evaluating assays of the invention may be of any type. High-density 

30 expression arrays can be used, but other techniques are also contemplated. Methods for 
examining gene expression, often but not always hybridization based, include, e.g., 
Northern blots; dot blots; primer extension; nuclease protection; subtractive hybridization 
and isolation of non-duplexed molecules using, e.g., hydroxyapatite; solution hybridization; 
filter hybridization; amplification techniques such as RT-PCR and other PCR-related 
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techniques such as differential display, LCR, AFLP, RAP, etc. (see, e.g., U.S. Patents 
4,683,195 and 4,683,202; PCR Protocols: A Guide to Methods and Applications (Innis et 
al, eds, 1990); Liang & Pardee, Science 257:967-971 (1992); Hubank & Schatz, Nuc. Acids 
Res. 22:5640-5648 (1994); Perucho et al, Methods Enzymol 254:275-290 (1995)), 
5 fingerprinting, e.g., with restriction endonucleases (Ivanova et al., Nuc. Acids. Res. 

23:2954-2958 (1995); Kato, Nuc. Acids Res. 23:3685-3690 (1995); and Shimkets et al, 
Nature Biotechnology 17:798-803, see also US Patent No. 5,871,697)); and the use of 
structure specific endonucleases (see, e.g., De Francesco, The Scientist 12:16 (1998)). 
mRNA expression can also be analyzed using mass spectrometry techniques (e.g., MALDI 

10 or SELDI), liquid chromatography, and capillary gel electrophoresis, as described below. 

For a general description of these techniques, see also Sambrook et al., 
Molecular Cloning, A Laboratory Manual (2nd ed. 1989), see, e.g., pages 7.37-7.39, 7.53- 
7.54, 7.58-7.66, and 7.71-7.79; Kriegler, Gene Transfer and Expression: A Laboratory 
Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al, eds., 1994). 

15 Techniques have been developed that expedite expression analysis and 

sequencing of large numbers of nucleic acids samples. For example, nucleic acid arrays 
have been developed for high density and high throughput expression analysis (see, e.g., 
Granjeuad et al, BioEssays 21:781-790 (1999); Lockhart & Winzeler, Nature 405:827-836 
(2000)). Nucleic acid arrays refer to large numbers (e.g., tens, hundreds, thousands, tens of 

20 thousands, or more) of different nucleic acid probes bound to solid substrates, such as 

nylon, glass, or silicon wafers (see, e.g., Fodor et al, Science 251:767-773 (1991); Brown & 
Botstein, Nature Genet. 21 :33-37 (1999); Eberwine, Biotechniques 20:584-591 (1996)). A 
single array can contain probes corresponding to an entire genome, to all genes expressed 
by the genome, or to a selected subset of genes. The probes on the array can be DNA 

25 oligonucleotide arrays (e.g., GeneChip®, see, e.g., Lipshutz et al, Nat. Genet. 21:20-24 

(1999)), mRNA arrays, cDNA arrays, EST arrays, or optically encoded arrays on fiber optic 
bundles (e.g., BeadArray™). The samples applied to the arrays for expression analysis can 
be, e.g., PCR products, cDNA, mRNA, etc. 

Additional techniques for rapid gene sequencing and analysis of gene 

30 expression include, for example, SAGE (serial analysis of gene expression). For SAGE, a 
short segment of the original transcript (typically about 14 bp) is cleaved from the transcript 
for analysis. This sequence contains sufficient information to uniquely identify a transcript, 
and is referred to as a sequence tag. Sequence tags are collected from all the mRNA 
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transcripts of a sample by binding of the poly- A tail of the mRNAs to a poly-T column. 
The sequence tags are linked together to form long concatameric molecules that are cloned, 
amplified, and sequenced. Analysis of the resulting sequence data will identify each 
transcript and reveal the number of times a particular tag is observed. Thus the method 
5 permits the expression level of the corresponding transcript to be determined (see, e.g., 
Velculescu et al, Science 270:484-487 (1995); Velculescu et al, Cell 88 (1997); and de 
Waard et al., Gene 226:1-8 (1999)). 

Embodiments of the invention 

10 As described herein, each of these techniques can be used, alone or in 

combination, to identify a classifier gene or set of classifier genes expressed in a cell, tissue 
organ or disease state. Classifier genes may encode, for example, ion channels, receptors, G 
protein coupled receptors, cytokines, chemokines, signal transduction proteins, 
housekeeping proteins, cell cycle regulation proteins, transcription factors, zinc finger 

15 proteins, chromatin remodeling proteins, etc. Once a classifier gene or set of classifier 

genes is analyzed in a particular biological sample, the results are compared to the reference 
sets of the Tables 1-6. The physiological state of the sample can then be determined. 
Information gained from the analysis of classifier genes in a sample can be used in to 
diagnose the potential for the disease to progress, the actual stage to which a disease has 

20 progressed (e.g. metastatic colorectal cancer), or to monitor the efficacy of therapeutic 
regimens given to a patient. 

RNA or protein can be isolated and assayed from a biological sample using 
any techniques, for example, they can be isolated from fresh or frozen biopsy, from 
formalin-fixed tissue, from body fluids, such as blood, plasma, serum, urine, or sputum. Of 

25 course the present invention is not limited to the nature of the samples or the nature of the 
comparison, and will find use in a variety of applications. 

The treatment of cancer has been hampered by the fact that there is 
considerable heterogeneity even within one type of cancer. Some cancers, for example, 
have the ability to invade tissues and display an aggressive course of growth characterized 

30 by metastases. These tumors generally are associated with a poor outcome for the patient. 
And yet, without a means of identifying such tumors and distinguishing such tumors from 
non-invasive cancer, the physician is at a loss to change and/or optimize therapy. 
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The present invention may be used to compare normal tissue with cancer 
tissue, as well as to differentiate between cancer tissue that is non-metastatic, cancer that is 
metastatic, and cancer tissue that has a potential to metastasize. 

In yet another embodiment, the present invention may be used to determine 
5 the health status of a cell culture, tissue, or organ. 

The present invention also finds use in drug screening. For example, 
samples treated with different candidate drugs can be subjected to the methods of the 
present invention to determine the ability of the compounds to alter the expression of 
classifier genes known to be implicated in the disease state. For example, if a particular 
10 classifier gene is known to be over-expressed in cancer cells, one can look for drugs that 
reduce the expression of the suspect gene or set of genes to normal levels. 

Analysis of gene expression may be at the gene transcript or the protein 
level. The amount of gene expression may be evaluated using nucleic acid probes to the 
DNA or RNA equivalent of the gene transcript. Alternatively, the final gene product itself 
15 (protein) can be monitored, for example, with antibodies to the classifier protein and 

standard immunoassays (ELISAs, etc.) or other techniques, including mass spectroscopy 
assays, 2D gel electrophoresis assays, etc. Proteomics and separation techniques may also 
allow quantification of expression. 

In a preferred embodiment, gene expression monitoring is performed 
20 simultaneously on a number of genes. Multiple protein expression monitoring can be 
performed as well. 

In one embodiment, the classifier gene nucleic acid probes are attached to 
biochips as outlined herein for the detection and quantification of nucleotide sequences in a 
particular cell or tissue. 

25 

General recombinant DNA methods 

This invention relies on routine techniques in the field of recombinant 
genetics. Basic texts disclosing the general methods of use in this invention include 
Sambrook et aL, Molecular Cloning, A Laboratory Manual (2nd ed. 1989); Kriegler, Gene 
30 Transfer and Expression: A Laboratory Manual (1 990); and Current Protocols in 
Molecular Biology (Ausubel et aL, eds., 1994)). 

For nucleic acids, sizes are given in either kilobases (kb) or base pairs (bp). 
These are estimates derived from agarose or acrylamide gel electrophoresis, from sequenced 
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nucleic acids, or from published DNA sequences. For proteins, sizes are given in 
kilodaltons (kD) or amino acid residue numbers. Proteins sizes are estimated from gel 
electrophoresis, from sequenced proteins, from derived amino acid sequences, or from 
published protein sequences. 
5 Oligonucleotides that are not commercially available can be chemically 

synthesized according to the solid phase phosphoramidite triester method first described by 
Beaucage & Caruthers, Tetrahedron Letts. 22:1859-1862 (1981), using an automated 
synthesizer, as described in Van Devanter et. al, Nucleic Acids Res. 12:6159-6168 (1984). 
Purification of oligonucleotides is by either native acrylamide gel electrophoresis or by 
10 anion-exchange HPLC as described in Pearson & Reanier, J. Chrom. 255:137-149 (1983). 

The sequence of the cloned genes and synthetic oligonucleotides can be 
verified after cloning using, e.g., the chain termination method for sequencing double- 
stranded templates of Wallace et aL, Gene 16:21-26 (1981). 

15 Cloning methods for the isolation of nucleotide sequences 

In general, nucleic acid sequences are cloned from cDNA and genomic DNA 
libraries or isolated using amplification techniques such as polymerase chain reaction 
(PGR). The primers used for PGR may amplify either the full length sequence or a probe of 
one to several hundred nucleotides, which is subsequently used to screen a library for full- 

20 length clones. Various combinations of oligonucleotides can be used to amplify coding and 
non-coding regions of the nucleotide sequence. 

Nucleic acids can also be isolated from expression libraries using antibodies 
as probes. Polyclonal or monoclonal antibodies can be raised using the translation of a 
coding sequence, or any immunogenic portion thereof. 

25 To make a cDNA library, one should choose a source that is rich in mRNA 

of the molecule one desires to clone. The mRNA is then made into cDNA using reverse 
transcriptase, ligated into a recombinant vector, and transfected into a recombinant host for 
propagation, screening and cloning. Methods for making and screening cDNA libraries are 
well known (see, e.g., Gubler & Hoffinan, Gene 25:263-269 (1983); Sambrook et aL, supra; 

30 Ausubel et ah, supra). 

For a genomic library, the DNA is extracted from the tissue and either 
mechanically sheared or enzymatically digested to yield fragments of about 12-20 kb. The 
fragments are then separated by gradient centrifugation from undesired sizes and are 
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constructed in bacteriophage lambda vectors. These vectors and phage are packaged in 
vitro. Recombinant phage are analyzed by plaque hybridization as described in Benton & 
Davis, Science 196:180-182 (1977). Colony hybridization is carried out as generally 
described in Grunstein et al 9 Proc. Natl Acad. Set USA, 72:3961-3965 (1975). 
5 An alternative method of isolating specific nucleic acids and their orthologs, 

alleles, mutants, polymorphic variants, and conservatively modified variants combines the 
use of synthetic oligonucleotide primers and amplification of an RNA or DNA template (see 
U.S. Patents 4,683,195 and 4,683,202; PCR Protocols: A Guide to Methods and 
Applications (Linis et ah, eds, 1990)). Methods such as polymerase chain reaction (PCR) 

10 and ligase chain reaction (LCR) can be used to amplify nucleic acid sequences of target 
molecules directly from mRNA, from cDNA, from genomic libraries or cDNA libraries. 
Degenerate oligonucleotides can be designed to amplify target molecules homologs using 
the sequences provided herein. Restriction endonuclease sites can be incorporated into the 
primers. Polymerase chain reaction or other in vitro amplification methods may also be 

15 useful, for example, to clone nucleic acid sequences that code for proteins to be expressed, 
to make nucleic acids to use as probes for detecting the presence of target molecule- 
encoding mRNA in physiological samples, for nucleic acid sequencing, or for other 
purposes. Genes amplified by the PCR reaction can be purified from agarose gels and 
cloned into an appropriate vector. 

20 Once isolated the nucleic acid is typically cloned into intermediate vectors 

before transformation into prokaryotic or eukaryotic cells for replication and/or expression. 
These intermediate vectors are typically prokaryote vectors, e.g., plasmids, or shuttle 
vectors. 

25 Expression of cloned nucleotide sequences in prokaryotes and eukaryotes 

To obtain high level expression of a cloned gene, one typically subclones the 
gene into an expression vector that contains a strong promoter to direct transcription, a 
transcription/translation terminator, and if for a nucleic acid encoding a protein, a ribosome 
binding site for translational initiation. Suitable bacterial promoters are well known in the 
30 art and described, e.g., in Sambrook et ah, and Ausubel et al, supra. Bacterial expression 
systems for expressing the target proteins are available in, e.g., E. coli r Bacillus sp., and 
Salmonella (Palva et al, Gene 22:229-235 (1983); Mosbach et al, Nature 302:543-545 
(1983). Kits for such expression systems are commercially available. Eukaryotic 
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expression systems for mammalian cells, yeast, and insect cells are well known in the art 
and are also commercially available. 

Selection of the promoter used to direct expression of a heterologous nucleic 
acid depends on the particular application. The promoter is preferably positioned about the 
5 same distance from the heterologous transcription start site as it is from the transcription 
start site in its natural setting. As is known in the art, however, some variation in this 
distance can be accommodated without loss of promoter function. 

In addition to the promoter, the expression vector typically contains a 
transcription unit or expression cassette that contains all the additional elements required for 
10 the expression of the target molecule-encoding nucleic acid in host cells. A typical 

expression cassette thus contains a promoter operably linked to the nucleic acid sequence 
encoding target molecules and signals required for efficient polyadenylation of the 
transcript, ribosome binding sites, and translation termination. Additional elements of the 
cassette may include enhancers and, if genomic DNA is used as the structural gene, introns 
15 with functional splice donor and acceptor sites. 

hi addition to a promoter sequence, the expression cassette should also 
contain a transcription termination region downstream of the structural gene to provide for 
efficient termination. The termination region may be obtained from the same gene as the 
promoter sequence or may be obtained from different genes. 
20 The particular expression vector used to transport the genetic information 

into the cell is not particularly critical. Any of the conventional vectors used for expression 
in eukaryotic or prokaryotic cells may be used. Standard bacterial expression vectors 
include plasmids such as pBR322 based plasmids, pSKF, pET23D, and fusion expression 
systems such as MBP, GST, and LacZ. Epitope tags can also be added to recombinant 
25 proteins to provide convenient methods of isolation, e.g., c-myc. 

Expression vectors containing regulatory elements from eukaryotic viruses 
are typically used in eukaryotic expression vectors, e.g., SV40 vectors, papilloma virus 
vectors, and vectors derived from Epstein-Barr virus. Other exemplary eukaryotic vectors 
include pMSG, pAV009/A + , pMTO10/A + , pMAMneo-5, baculovirus pDSVE, and any 
30 other vector allowing expression of proteins under the direction of the CMV promoter, 
SV40 early promoter, S V40 later promoter, metallothionein promoter, murine mammary 
tumor virus promoter, Rous sarcoma virus promoter, polyhedrin promoter, or other 
promoters shown effective for expression in eukaryotic cells. 
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Expression of proteins from eukaryotic vectors can be also be regulated 
using inducible promoters. With inducible promoters, expression levels are tied to the 
concentration of inducing agents, such as tetracycline or ecdysone, by the incorporation of 
response elements for these agents into the promoter. Generally, high level expression is 
5 obtained from inducible promoters only in the presence of the inducing agent; basal 

expression levels are minimal. Inducible expression vectors are often chosen if expression 
of the protein of interest is detrimental to eukaryotic cells. 

Some expression systems have markers that provide gene amplification such 
as thymidine kinase and dihydrofolate reductase. Alternatively, high yield expression 

10 systems not involving gene amplification are also suitable, such as using a baculovirus 

vector in insect cells, with a target molecule-encoding sequence under the direction of the 
polyhedrin promoter or other strong baculovirus promoters. 

The elements that are typically included in expression vectors also include a 
replicon that functions in E. coli, a gene encoding antibiotic resistance to permit selection of 

1 5 bacteria that harbor recombinant plasmids, and unique restriction sites in nonessential 

regions of the plasmid to allow insertion of eukaryotic sequences. The particular antibiotic 
resistance gene chosen is not critical— any of the many resistance genes known in the art are 
suitable. The prokaryotic sequences are preferably chosen such that they do not interfere 
with the replication of the DNA in eukaryotic cells, if necessary. 

20 Standard transfection methods are used to produce bacterial, mammalian, 

yeast or insect cell lines that express large quantities of target protein, which are then 
purified using standard techniques {see, e.g., Colley et al., J. Biol. Chem. 264:17619-17622 
(1989); Guide to Protein Purification, in Methods in Enzymology, vol. 182 (Deutscher, ed., 
1990)). Transformation of eukaryotic and prokaryotic cells are performed according to 

25 standard techniques {see, e.g., Morrison, J. Bact. 132:349-351 (1977); Clark-Curtiss & 
Curtiss, Methods in Enzymology 101:347-362 (Wu et al., eds, 1983). 

Any of the well-known procedures for introducing foreign nucleotide 
sequences into host cells may be used. These include the use of calcium phosphate 
transfection, polybrene, protoplast fusion, electroporation, biolistics, liposomes, 

30 microinjection, plasma vectors, viral vectors and any of the other well known methods for 
introducing cloned genomic DNA, cDNA, synthetic DNA or other foreign genetic material 
into a host cell {see, e.g., Sambrook et al, supra). It is only necessary that the particular 
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genetic engineering procedure used be capable of successfully introducing at least one gene 
into the host cell capable of expressing the gene. 

After the expression vector is introduced into the cells, the transfected cells 
are cultured under conditions favoring expression of the gene or gene fragment. The 
5 product of the expressed gene or gene fragment is then recovered from the culture using 
standard techniques identified below. 

Purification of classifier gene polypeptides 

Either naturally occurring or recombinant proteins can be purified and used 

10 to generate antibodies. Naturally occurring proteins can be purified from a variety of 

sources. However, in a preferred embodiment the proteins are isolated from mammalian 
tissue. In a particularly preferred embodiment, the proteins are isolated from human tissue. 
Recombinant classifier proteins can be purified from any suitable expression system. 

The proteins may be purified to substantial purity by standard techniques, 

15 including selective precipitation with such substances as ammonium sulfate; column 
chromatography, immunopurification methods, and others (see, e.g., Scopes, Protein 
Purification: Principles and Practice (1982); U.S. Patent No. 4,673,641; Ausubel et ah, 
supra; and Sambrook et ah, supra). 

A number of procedures can be employed when recombinant proteins are 

20 being purified all are familiar to those of skill in the art. For example, proteins having 

established molecular adhesion properties can be reversibly fused to another protein. With 
the appropriate ligand, the protein of interest may be selectively adsorbed to a purification 
column and then freed from the column in a relatively pure form. The fused protein is then 
removed by enzymatic activity. Finally, if antibodies to a portion of the protein are 

25 available, the protein may be purified using immunoaffinity columns. 

Antibodies to classifier gene polypeptides 

Where the classifier gene product is a polypeptide encoded by a 
polynucleotide of the Tables 1-6, gene expression profiling can be examined using 
30 antibodies to the expressed classifier proteins. 

To make effective antibodies, the classifier protein should share at least one 
epitope or determinant with the full length protein. By "epitope" or "determinant" herein is 
typically meant a portion of a protein which will generate and/or bind an antibody or T-cell 
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receptor in the context of MHC. Thus, in most instances, antibodies made to a smaller 
classifier protein will be able to bind to the full-length protein, particularly linear epitopes. 
In a preferred embodiment, the epitope is unique; that is, antibodies generated to a unique 
epitope show little or no cross-reactivity. 
5 Both polyclonal and monoclonal antibodies may be raised against the 

classifier proteins encoded by the classifier genes shown in the reference sets of the Tables 
1-6. Methods of producing polyclonal and monoclonal antibodies that react specifically 
with specific proteins are known to those of skill in the art (see, e.g., Coligan, Current 
Protocols in Immunology (1991); Harlow & Lane, supra; Goding, Monoclonal Antibodies: 

10 Principles and Practice (2d ed. 1986); and Kohler & Milstein, Nature 256:495-497 (1975)). 
Such techniques include antibody preparation by selection of antibodies from libraries of 
recombinant antibodies in phage or similar vectors (see Winthrop et ah, Q JNucl Med 
44:284-95 (2000)), as well as preparation of polyclonal and monoclonal antibodies by 
immunizing rabbits or mice (see, e.g., Huse et al., Science 246:1275-1281 (1989); Ward et 

15 al. f Nature 341:544-546 (1989)). For some applications, recombinant antibody fragments 
derived from monoclonal antibodies - such as single-chain antibodies, diabodies, and 
minibodies - are preferred (see Wu and Yazaki, Q JNucl Med 44:268-83 (2000)). 

A number of immimogens comprising portions of classifier proteins encoded 
by the classifier genes of the Tables 1-6 may be used to produce antibodies specifically 

20 reactive with classifier proteins. For example, recombinant classifier proteins, or an 

antigenic fragment thereof can be isolated as is known in the art. Recombinant protein can 
be expressed in eukaryotic or prokaryotic cells, and then purified by well established 
methods known in the art. Recombinant protein is the preferred immunogen for the 
production of monoclonal or polyclonal antibodies. Alternatively, a synthetic peptide 

25 derived from the sequences disclosed herein and conjugated to a carrier protein can be used 
an immunogen. Naturally occurring protein may also be used either in pure or impure form. 
The product is then injected into an animal capable of producing antibodies. Either 
monoclonal or polyclonal antibodies may be generated, for subsequent use in 
immunoassays to measure the protein. 

30 Methods of production of polyclonal antibodies are known to those of skill in 

the art. An inbred strain of mice (e.g., BALB/C mice) or rabbits is immunized with the 
protein using a standard adjuvant, such as Freund's adjuvant, and a standard immunization 
protocol. The animal's immune response to the immunogen preparation is monitored by 
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taking test bleeds and determining the titer of reactivity to the immunogen. When 
appropriately high titers of antibody to the immunogen are obtained, blood is collected from 
the animal, and antisera are prepared. Further fractionation of the antisera to enrich for 
antibodies reactive to the protein can be done if desired (see, Harlow & Lane, supra). 
5 Monoclonal antibodies and polyclonal sera are collected and titered against 

the immunogen protein in an immunoassay, for example, a solid phase immunoassay with 
the immunogen immobilized on a solid support. Typically, polyclonal antisera with a titer 
of 10 4 or greater are selected and tested for their cross reactivity against non-homologous 
proteins and other family proteins, using a competitive binding immunoassay. Specific 
10 polyclonal antisera and monoclonal antibodies will usually bind with a Kd of at least about 
0.1 mM, more usually at least about 1 \±M, preferably at least about 0.1 jjM or better, and 
most preferably, 0.01 [iM or better. Antibodies specific only for a particular protein 
ortholog can also be made, by subtracting out other cross-reacting orthologs from a species 
such as a non-human mammal. 

15 

Methods for comparing gene expression profiles with reference sets of the Tables 1-6 

Patterns of gene expression can be compared to the reference set of the 
Tables 1-6 manually (by a person) or by a computer or other machine. An algorithm can be 
used to detect similarities and differences. The algorithm may score and compare, for 

20 example, the genes which are expressed and the genes which are not expressed. If the genes 
are expressed, the algorithm may further be used to quantify the expression by looking for 
relative changes in intensity of expression of a particular gene. A variety of algorithms for 
such comparisons are known in the art (see e.g. Breiman L, Friedman JH., Olshen RA, and 
Stone CJ. (1984) Classification and Regression Trees. Wadsworth and Brooks/Cole, 

25 Monterey CA) 

Similarities in the gene expression profile of the classifier genes in a 
biological sample and a reference set may be determined with reference to which genes are 
expressed in both samples and/or which genes are not expressed in both samples. 
Alternatively, the relative differences in intensity of expression of two or more classifier 
30 genes in a sample, may be a basis for deciding similarity or difference. Differences in gene 
expression are considered significant when they are greater than 2-fold, 3-fold or 5-fold 
from the value defined by expression in a reference set of classifier genes. 
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Mathematical approaches can also be used to conclude whether similarities 
or differences in the gene expression exhibited by different samples are significant. See, 
e.g., Golub et al., Science 286, 531 (1999); Duda, et al. (2001) Pattern Classification Wiley; 
and Hastie, et al. (2001) The Elements of Statistical Learning: Data Mining, Inference, and 
5 Prediction Springer- Verlag. One approach to determine whether a sample is more similar to 
or has maximum similarity with a given condition between the sample and one or more 
pools representing different conditions for comparison; the pool with the smallest vector 
angle is then chosen as the most similar to the biological sample among the pools compared. 

The gene expression patterns of the tissue sample will be compared against 

10 the expression patterns designated in the Tables 1-6. This comparison will lead to the 
determination of whether or not a sample has metastatic potential. 

Differences in gene expression are considered significant when the 
differences in mean expressions across samples is detected with statistical significance and 
such that the level of falsely detected signficant genes is near zero (Efron B, Tibshirani R, 

15 Storey JD, and Tusher V. (2001) Empirical Bayes analysis of a microarray experiment. 
Journal of the American Statistical Association, 96: 1151-1 160.) 

Since the comparison of gene expression profiles can be made with 
computers or other machines as well as manually, the invention also provides for the storage 
and retrieval of a collection of data in a computer data storage apparatus, which can include 

20 magnetic disks, optical disks, magneto-optical disks, DRAM, SRAM, SGRAM, SDRAM, 
RDRAM, DDR RAM, magnetic bubble memory devices, and other data storage devices, 
including CPU registers and on-CPU data storage arrays. Typically, the data records are 
stored as a bit pattern in an array of magnetic domains on a magnetizable medium or as an 
array of charge states or transistor gate states, such as an array of cells in a DRAM device 

25 (e.g., each cell comprised of a transistor and a charge storage area, which may be on the 

transistor). In one embodiment, the invention provides such storage devices, and computer 
systems built therewith, comprising a bit pattern encoding a protein expression fingerprint 
record comprising unique identifiers for at least 10 data records cross-tabulated with 
source. 

30 The invention preferably provides a method for identifying peptide or 

nucleic acid sequences and determining the level of similarity or difference to a reference 
set, comprising performing a computerized comparison between a peptide or nucleic acid 
expression profiling record stored in or retrieved from a computer storage device or 
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database and a reference set. The comparison can include a comparison algorithm or 
computer program embodiment thereof (e.g., FASTA, TFASTA, GAP, BESTFIT) and/or 
the comparison may be of the absolute or relative amount of a peptide or nucleic acid 
sequence in a pool of determined from a polypeptide or nucleic acid sample of a specimen. 
5 The invention also provides a magnetic disk, such as an IBM-compatible 

(DOS, Windows, Windows95/98/2000, Windows NT, OS/2) or other format (e.g., Linux, 
SunOS, Solaris, AJX, SCO Unix, VMS, MV, Macintosh, etc.) floppy diskette or hard 
(fixed, Winchester) disk drive, comprising a bit pattern encoding data from an assay of the 
invention in a file format suitable for retrieval and processing in a computerized sequence 

10 analysis, comparison, or relative quantitation method. 

The invention also provides a network, comprising a plurality of computing 
devices linked via a data link, such as an Ethernet cable (coax or lOBaseT), telephone line, 
ISDN line, wireless network, optical fiber, or other suitable signal transmission medium, 
whereby at least one network device (e.g., computer, disk array, etc.) comprises a pattern of 

15 magnetic domains (e.g., magnetic disk) and/or charge domains (e.g., an array of DRAM 
cells) composing a bit pattern encoding data acquired from an assay of the invention. 

The invention also provides a method for transmitting expression profiling 
data that includes generating an electronic signal on an electronic communications device, 
such as a modem, ISDN terminal adapter, DSL, cable modem, ATM switch, or the like, 

20 wherein the signal includes (in native or encrypted format) a bit pattern encoding data from 
an assay or a database comprising a plurality of assay results obtained by the method of the 
invention. 

In a preferred embodiment, the invention provides a computer system for 
comparing a query target to a database containing an array of data structures, such as an 

25 expression profiling result obtained by the method of the invention, and ranking database 
based on the degree of identity with one or more reference sets of the Tables 1-6. A central 
processor is preferably initialized to load and execute the computer program for comparison 
of the expression profiling results. Data for a query target is entered into the central 
processor via an I/O device. Execution of the computer program results in the central 

30 processor retrieving the expression profiling data from the data file, which comprises a 
binary description of an expression profiling result. 

The expression profiling data and the computer program can be transferred to 
secondary memory, which is typically random access memory (e.g., DRAM, SRAM, 
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SGRAM, or SDRAM). Expression profiles are ranked according to the degree of 
correspondence between an expression profile and one or more reference sets of the Tables 
1-6. Results are output via an I/O device. For example, a central processor can be a 
conventional computer (e.g., Intel Pentium, PowerPC, Alpha, PA-8000, SPARC, MIPS 
5 4400, MIPS 10000, VAX, etc.); a program can be a commercial or public domain molecular 
biology software package (e.g., UWGCG Sequence Analysis Software, Darwin); a data file 
can be an optical or magnetic disk, a data server, a memory device (e.g., DRAM, SRAM, 
SGRAM, SDRAM, EPROM, bubble memory, flash memory, etc.); an I/O device can be a 
terminal comprising a video display and a keyboard, a modem, an ISDN terminal adapter, 
10 an Ethernet port, a punched card reader, a magnetic strip reader, or other suitable I/O 
device. 

The invention also provides the use of a computer system, such as that 
described above, which comprises: (1) a computer; (2) a stored bit pattern encoding a 
collection of expression profiles obtained by the methods of the invention, which may be 
15 stored in the computer; (3) reference sets of the Tables 1-6, and (4) a program for 

comparison, typically with rank-ordering of comparison results on the basis of computed 
similarity values. 

EXAMPLES 

20 EXAMPLE 1 : Identification of the Metastatic Potential of a Colorectal Cancer Tissue 

Sample Using Nucleic Acid and Antibody Based Assays 

RNA can be extracted from tissue samples, and the presence or absence on 

metastatic colorectal cancer can be determined by comparing the expression profile of 

classifier genes in the sample to the defined sets of genes of the Tables 1-6. Analysis of the 
25 expression profile can be carried out by measuring expression levels of classifier gene 

mRNA or protein. 

For example, tissue from a non-metastatic Duke's stage B primary tumor, 
and from colorectal cancer that has progressed to end stage liver metastasis. Expression 
profiles of classifier genes from each sample are generated by creating an expression profile 
30 of either nucleic acid based data, or protein based data. The information obtained in the 

expression profiling is then analyzed and compared so that the relative expression levels of 
classifier genes in the two samples is used to create reference sets of genes such as those 
provided in the Tables 1-6. Expression patterns from samples whose disease state is 
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unknown can then be compared to the defined sets of classifier genes in the Tables 1-6 and 
the presence or absence of metastatic colorectal cancer is diagnosed. If metastatic 
colorectal cancer is diagnosed, then further analysis of the data can reveal the stage of the 
disease and the probable prognosis. 
5 The analysis of mRNA is preferred. For mRNA analysis, labeled, e.g., 

fluorescent or biotinylated, RNA from the unknown sample may be analyzed with an 
oligonucleotide microarray comprising sequences corresponding to the classifier genes of 
the Tables 1-6. Techniques for analysis and set up of the microarrays are known in the art. 
Results of the analysis are used to identify which classifier genes are 

10 expressed and the level of their expression (as judged by the intensity of the signal). The 

pattern generated by the microarray analysis is then compared to the defined sets of genes of 
the Tables 1-6, and a determination of whether metastatic colorectal cancer is present is 
made. If metastatic disease is present the stage of the disease can also be determined. 

In another embodiment, an expression profile of a sample is generated by 

15 examining the protein expression pattern of the sample. In this embodiment, total protein is 
extracted from a sample of the tissue (e.g., liver). Total protein is run on an acrylamide gel, 
then analyzed by western blot using antibodies to classifier genes of the Tables 1-6. As in 
the case of mRNA analysis, the expression pattern revealed in the western blot is compared 
to the defined sets of genes of the Tables 1-6. A match between the expression pattern of 

20 the sample with a particular defined set or sets of genes of the Tables 1-6 will permit the 
determination of whether or not cancer is present. 

The defined sets of classifier genes of the Tables 1-6 are superior in their 
predictive power, because their expression strongly correlates with colorectal cancer 
metastasis. These defined sets of genes therefore provide ready tools for the diagnosis and 

25 prognosis evaluation of cancer, particularly metastatic colorectal cancer. 



EXAMPLE 2: Protein Based Determination of Classifier gene Expression and 
Quantification of Expression Levels Using 2-Dimensional Gel Electrophoresis 

The expression pattern of classifier genes can be determined from the 
30 expression pattern of the corresponding proteins. Classifier proteins can be identified, e.g., 
by their positions on a gel following 2-dimensional gel electrophoresis of a sample of tissue 
subject to analysis. 
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Methods of 2-dimensional gel electrophoresis are well known in the art. 
Well characterized proteins, such as the classifier genes of the Tables 1-6, can be isolated 
from their unique placement within a gel after separation according to, for example, 
isoelectric point in the first dimension and molecular size in the second dimension. Thus, it 
5 is possible to determine expression levels of classifier proteins in a sample, as well as 

absolute expression levels of classifier proteins without the need for preparation of classifier 
protein specific antibodies. 

Expression profiles of classifier genes generated in this manner can by 
compared with the defined sets of genes of the Tables 1-6 and the metastatic potential of the 
10 sample can thereby be determined. 



Table 1: Genes Differentially regulated in Metastatic Colorectal Cancer 



Cluster 


Exemplar 1 
Accession 


UniGenelD 


UniGeiieTitle | 


1 


NA 


Hs.76297 


G protein-coupled receptor kinase 6 (GPRK6), mRNA. 


1 


NM 173483 


NA 


NM 173483 Homo sapiens hypothetical protein FLJ39501 (FU39501) 


1 


NM 003468.2 


NA 


NM_003468.2| Homo sapiens frizzled homolog 5 (Drosophila) (FZD5), mRNA 


1 


NA 


NA 


Target Exon 


1 


AC007050.25 


NA 


ESTs 


1 


NA 


NA 


Target Exon 


1 


W25945 


Hs.8173 


hypothetical protein FIJI 0803 


1 


AW054922 


Hs.53478 


Homo sapiens cDNA FLJ12366 fis, clone MAMMA1 00241 1 


1 


AW847814 


Hs.289005 


Homo sapiens cDNA: FLJ21532 fis, clone COL06049 


1 


BE244200 


Hs.406243 


KIAA0410 gene product 


1 


AW514668 


Hs. 194258 


ESTs, Moderately similar to ALU5 HUMAN ALU SUBFAMILY SC SEQUENCE 
CONTAMINATION WARNING ENTRY TH.sapiensl 


1 


AA249096 


Hs.32793 


ESTs 


1 


L26953 


Hs.1010 


regulator of mitotic spindle assembly 1 


1 


AI381687 


Hs.404198 


ESTs 


1 


N99638 


Hs.87409 


gb:za39gl Lrl Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone 5' similar to 
contains Alu repetitive element;, mRNA sequence 


1 


AI205785 


Hs. 1901 53 


ESTs 


1 


AW965212 


Hs.278871 


hypothetical protein FLJ30921 (FLJ30921), mRNA. 


1 


AL1 19442 


Hs.3 80968 


eukaryotic translation initiation factor 4 gamma, 2 


1 


AA358045 


NA 


gb:EST66944 Fetal lung III Homo sapiens cDNA 5' end similar to EST containing Alu 
repeat, mRNA sequence 


1 


AL050276 


Hs. 159456 


zinc finger protein 288 


1 


AI052358 


Hs.131741 


ESTs 


1 


AW976570 


Hs.97387 


ESTs 


1 


AI936504 


Hs.2083 


CDC-like kinase 1 


1 


AA400079 


Hs.257854 


ESTs 


1 


AW883367 


Hs.356546 


hypothetical protein MGC5306 
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1 


AA417696 


Hs.372121 


ESTs 


1 


AA470152 


Hs.368209 


ESTs 


1 


AW971375 


Hs.292921 


ESTs 


1 


AW971070 


Hs.291160 


lib is, wealcly similar to ALU! HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiens] 


1 


T87431 


Hs. 190738 


ESTs 


1 


AA531129 


Hs. 190297 


ESTs 


1 


AW439330 


Hs.256889 


ESTs, Weakly similar to 2109260A B cell growth factor [H.sapiens] 


1 


AW1 57424 


Hs.280685 


ESTs, Weakly similar to 138022 hypothetical protein [H.sapiens] 


1 


AB040966 


Hs.83575 


KIAA1 533 protein 


1 


AW1 88370 


Hs.250383 


Homo sapiens cDNA FLJ14279 lis, clone PLACE1 005574 


1 


AA628539 


Hs.57783 


Homo sapiens eukaryotic translation initiation factor 3, subunit 9 eta, 1 16kDa (EIF3S9) 


1 


AA640770 


Hs.200994 


EST 


1 


AA664078 


NA 


gb:ac04a05.sl Stratagene lung (937210) Homo sapiens cDNA clone 3* similar to contains 
Alu repetitive element;, mRNA sequence 


1 


AA886511 


Hs.189282 


Homo sapiens cDNA: FLJ21429 fis, clone COL04205 


1 


AA830893 


Hs.l 19769 


ESTs 


1 


BE327477 


Hs.166941 


ESTs 


1 


AI821940 


Hs.72071 


hypothetical protein FLJ20038 


1 


AL137723 


Hs.5855 


Homo sapiens mRNA; cDNA DKFZp434D0818 (from clone DKFZp434D08l8) 


1 


AA769874 


Hs.l 55287 


ubiquitin-protein isopeptide ligase (E3) 


1 


AI126162 


Hs.129037 


ESTs 


1 


AW748336 


Hs.l 68052 


KIAA0421 protein 


1 


AW083789 


Hs.124620 


ESTs 


1 


AI034357 


Hs.211194 


ESTs, Weakly similar to ALU8 HUMAN ALU SUBFAMILY SX SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiens] 


1 


AW827419 


Hs.144139 


ESTs 


1 


BE262656 


Hs.32603 


hypothetical protein MGC3279 similar to collecting 


1 


AW469180 


Hs.346398 


ESTs 


1 


AI492857 


NA 


gb:th72h08.xl Scares NliHMPu SI Homo sapiens cDNA clone 3', mRNA sequence 


1 


AW451347 


Hs.175862 


ESTs 


1 


AI698091 


Hs.107845 


ESTs 


1 


AJ010046 


Hs.25155 


neuroepithelial cell transforming gene 1 


1 


AL043983 


Hs.l 25063 


Homo sapiens cDNA FLJ13825 fis, clone THYRO 1000558 


1 


AW3S2884 


Hs.5320 


ESTs 


1 


BE378541 


Hs.279815 


cysteine sulfinic acid decarboxylase-relatedprotein 2 


1 


R66282 


Hs.20247 


ESTs, Weakly similar to S65657 alpha-lC-adrenergic receptor splice form 2 [H.sapiens] 


1 


BE086548 


Hs.42346 


calcineurin-binding protein calsarcin-1 


1 


AA907305 


Hs.36475 


ESTs 


2 


AF083130 


Hs.381498 


Homo sapiens CATX-14 mRNA, partial cds 


2 


NM 032446.1 


NA 


NM 032446.11 Homo sapiens MEGF10 protein (MEGF10), mRNA 


2 


NA 


NA 


Target Exon 


2 


AW1 52207 


Hs.270977 


ESTs, Weakly similar to 138022 hypothetical protein [H.sapiens] 


2 


AA601038 


Hs.191797 


ESTs, Weakly similar to S65657 alpha- lC-adrenergic receptor splice form 2 [H.sapiens] 


2 


U28831 


Hs.44566 


KIAA1641 protein 
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2 


AV660717 


Hs.47144 


DKFZP586N0819 protein 


2 


AW444816 


Hs.171537 


hypothetical protein FLJ21596 


2 


AW589558 


Hs.299883 


hypothetical protein FU23399 


2 


AW590680 


Hs.355571 


von Willebrand factor 


2 


AW770280 


Hs.36258 


ESTs, Moderately similar to JC5238 galactosylceramide-like protein, GCP [H.sapiensl 


2 


AW451618 


Hs.380683 


ESTs 


2 


BE242691 


Hs.14947 


ESTs 


2 


AI056689 


Hs.133538 


ESTs, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


2 


BE081585 


NA 


gb:QV2-BT0635-210400-156-b07 BT0635 Homo sapiens cDNA, mRNA sequence 


2 


AI056885 


Hs. 133539 


ESTs 


2 


BE336632 


Hs.278850 


hypothetical protein FIJI 3687 


2 


AA827082 


Hs.291872 


ESTs 


2 


R11661 


Hs.14165 


ESTs, Moderately similar to ALUS HUMAN ALU SUBFAMILY SC SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiens] 


2 


R39769 


Hs.379238 


noia, lYiuucidLciy oimiidr to ALU o riuivi/viN /vl-U o U JL> Jt /MVl 1L> i ibA orlV^UiiiNL/ii 
CONTAMINATION WARNING ENTRY [H.sapiensl 


2 


AA1 88645 


Hs.250638 


Homo sapiens mRNA full length insert cDNA clone ELJRO IMAGE 152428 


2 


C75563 


Hs. 113029 


ribosomal protein S25 


2 


U90916 


Hs.82845 


Homo sapiens cDNA: FLJ21930 fis, clone HEP04301, highly similar to HSU90916 
Human clone 23815 mRNA sequence 


2 


AA601036 


Hs.285083 


ESTs 


2 


BE271922 


Hs.406392 


ESTs, Weakly similar to zinc finger protein [H.sapiensl 


2 


AA830402 


Hs.221216 


ESTs 


2 


AW975051 


Hs. 192044 


ESTs, Weakly similar to 178885 serine/threonine-specific protein kinase [H.sapiensl 


2 


AL080172 


Hs.105894 


hypothetical protein FLJ21919 


2 


AA310919 


Hs.7369 


Homo sapiens cDNA FLJ14343 fis, clone THYRO1000916 


2 


AI457640 


Hs.206632 


ESTs 


2 


AA335715 


Hs.98132 


ESTs 


2 


T94907 


Hs. 188572 


ESTs 


2 


AI174861 


Hs. 190623 


ESTs 


2 


AW881411 


Hs. 169078 


hypothetical protein FU23018 


2 


AA554827 


Hs.370705 


DKFZp434A0131 protein 


2 


H72531 


Hs.36190 


ESTs 


2 


AL042436 


Hs.97723 


ESTs 


2 


AI656478 


Hs.321622 


hypothetical protein FLJ20363 


2 


AA417614 


Hs. 136825 


ESTs 


2 


AI016712 


Hs.2877971 


integnn, beta 1 (fibronectin receptor, beta polypeptide, antigen CD29 includes MDF2, 
MSK12) 


2 


AA769365 


Hs. 126058 


ESTs 


2 


AA464964 


NA 


gb:zx80fl0.sl Soares ovary tumor NbHOT Homo sapiens cDNA clone 3*, mRNA 
sequence 


2 


AA847744 


Hs.370675 


ESTs 


2 


AW079559 


Hs. 152258 


ESTs 


2 


AI417881 


Hs.292464 


ESTs 


2 


BE350122 


Hs.157367 


ESTs, Weakly similar to 178885 serine/threonine-specific protein kinase [H.sapiensl 


2 


AA503053 


Hs.81474 


ESTs 
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2 


AA699965 


Hs.369440 


ESTs 


2 


AI660840 


Hs.191202 


ESTs, Weakly similar to ALUE_HUMAN ! ! ! ! ALU CLASS E WARNING ENTRY ! ! ! 
[H.sapiensl 


2 


AI341227 


Hs. 1571 06 


ESTs 


2 


AA830532 


Hs.372176 


ESTs 


2 


BE217838 


Hs. 152492 


ESTs 


2 


AA878324 


NA 


ESTs 


2 


AW362945 


Hs. 162459 


ESTs 


2 


AW296280 


Hs.152016 


Homo sapiens cDNA: FLJ22140 fis, clone HEP20977 


2 


AI241331 


Hs.75113 


general transcription factor IIIA 


2 


AF039697 


Hs. 132883 


serologically defined colon cancer antigen 31 


2 


AW390125 


Hs.240443 


Homo sapiens cDNA: FLJ23538 fis, clone LNG08010, highly similar to BETA2 Human 
MEN1 region clone epsilon/beta mRNA 


2 


AI208611 


Hs.333555 


Homo sapiens cDNA FIJI 1720 fis, clone HEMBA1 005293 


2 


AA610649 


Hs.333239 


ESTs 


2 


AF1 19913 


Hs.404158 


Homo sapiens PRO3077 mRNA, complete cds 


2 


AF1 32730 


Hs.149784 


hypothetical protein 


2 


AW974949 


Hs.87409 


ESTs 


2 


AI654144 


Hs.271511 


ESTs, Weakly similar to 178885 serine/threonine-specific protein kinase [H.sapiens] 


2 


R26877 


Hs.24128 


ESTs 


2 


BE551618 


Hs.82285 


piiuapnui luuisyigiycmeuiiiue luiiiiyiudiibicrdac, pnuapiioiiDosyigiycindiTiicie synineiase, 
phosphoribosyl amino imidazole synthetase 


2 


AA744692 


Hs. 166539 


ESTs 


2 


AL038624 


Hs.208752 


ESTs, Weakly similar to ALU8 HUMAN ALU SUBFAMILY SX SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


2 


AL080280 


H<? 383970 


ffh'Homf* <;or»ipnc mRTvI A fiill lpn<yiV> incprf r*T~Y?\JA firm* 1 T3T TR OTTWT A dTS 5? ^QO*\ 


2 


AA766142 


Hs 131810 




2 


BE466173 


Hs 145696 




2 


W7R940 


Hq 20526 


ESTs 


2 


AI767388 


Hs.37890 


Human DNA sequence from clone RP5-1024N4 on chromosome lp32.1-33. Contains the 
gene for a novel Sodiunnsolute symporter family member similar to SLC5A1 (SGLT1), a 
pseudogene similar to part of butyrophilin family members, a novel gene, ESTs, STSs, GS 


2 


R71264 


Hs. 16798 


ESTs 


2 


BE550891 


Hs.270624 


ESTs 


2 


NM 014135 


Hs.8345 


PRO0641 protein 


2 


AI076570 


Hs. 134053 


ESTs 


2 


AI371823 


Hs.34079 


ESTs 


2 


AF169312 


Hs.9613 


PPAR(gamma) angiopoietin related protein 


2 


AI344782 


Hs.349261 


DnaJ (Hsp40) homolog, subfamily C, member 3 


2 


All 74603 


Hs.254105 


enolase 1, (alpha) 


2 


AL040482 


Hs.286173 


KIAA1595 protein 


2 


AI670843 


Hs.370292 


ESTs 


2 


AI022813 


Hs.92679 


Homo sapiens clone CDABP0014 mRNA sequence 


2 


AF1 13925 


Hs. 19405 


caspase recruitment domain 4 


2 


H65629 


Hs.245997 


ESTs 


2 


T62926 


Hs.304184 


ESTs 


2 


AA353125 


Hs. 184721 


ESTs 
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2 


N33622 


NA 


gb:yv22hl0.sl Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone 3', mRNA ; 
sequence 


2 


AA002207 


Hs.17385 


Homo sapiens clone IMAGE: 11 97 16, mRNA sequence 


2 


AB020714 


Hs.24656 


KIAA0907 protein 


2 


AI2 18945 


Hs.226925 


ESTs 


2 


AA847992 


Hs.137003 


ESTs 


2 


AI924046 


Hs. 119567 


ESTs, Weakly similar to A47582 B-cell growth factor precursor [H.sapiens] 


2 


AL040914 


NA 


crh'TyK"F'7ri4'34T901 S «1 Aid. ^vnnnvm' TitPQ^ Hnmn conipnc r>P>"M A r»1r\np 
gu.JL^xvi. /jP'tj'tjz.v i ai 'tj't ^ayiiuiiyixi. mctyJ j xxuiiivj adpiCIlo l/J_/rNx\ UlOIlc 

DKFZp434J2015 3', mRNA sequence 


2 


AA683416 


Hs.209061 


sudD suppressor of bimD6 homolog (A nidulans) (SUDD), transcript variant 1, mRNA. 


2 


AW058464 


Hs.386465 


protein with polyglutamine repeat; calcium (ca2) homeostasis endoplasmic reticulum 
protein 


2 


BE549380 


Hs.307034 


Homo sapiens, clone IMAGE: 34605 3 9, mRNA, partial cds 


3 


U49973 


NA 


gb:Human Tiggerl transposable element, complete consensus sequence. 


3 


AI689496 


Hs.108932 


ESTs 


3 


AW293452 


Hs. 16228 


ESTs 


3 


AA776721 


Hs.85603 


down-regulated by Ctnnbl, a 


3 


AA581602 


Hs.41840 


ESTs 


3 


AI801098 


Hs.151500 


ESTs 


3 


AA740616 


NA 


gb:ny97fl l.sl NCI CGAP GCB1 Homo sapiens cDNA clone 3', mRNA sequence 


3 


AI807519 


Hs.104520 


Homo sapiens cDNA FLJ13694 fis, clone PLACE20001 15 


3 


AA327092 


NA 


ESTs 


3 


AA602917 


Hs.325520 


LAT1-3TM protein 


3 


NM 005781 


Hs.153937 


activated p21cdc42Hs kinase 


3 


AA640987 


Hs. 193767 


ESTs 


3 


AA135370 


Hs. 188536 


Homo sapiens cDNA: FLJ21635 fis, clone COL08233, highly similar to AF131819 Homo 
sapiens clone 24838 mRNA sequence 


3 


AW296451 


Hs.24605 


ESTs 


3 


AW299534 


Hs.105739 


ESTs 


3 


U26710 


Hs.3144 


Cas-Br-M (murine) ectropic retroviral transforming sequence b 


3 


AW362803 


Hs. 166271 


ESTs 


3 


AW975895 


NA 


ESTs 


3 


AW450376 


Hs.378828 


KIAA0665 gene product 


3 


AI002106 


Hs. 15670 


ESTs 


3 


AA811347 


NA 


gb:ob81h06.sl NCI CGAP GCB1 Homo sapiens cDNA clone 3', mRNA sequence 


3 


AI798851 


Hs.356716 


hemoglobin, gamma G 


3 


F06700 


Hs.7879 


interferon-related developmental regulator 1 


3 


AI564835 


Hs.381225 


ESTs, Weakly similar to Z195 HUMAN ZINC FINGER PROTEIN 195 [H.sapiens] 


3 


AWO 16607 


Hs.201582 


ESTs 


3 


AB007928 


Hs.374987 


KIAA0459 protein 


3 


S72043 


Hs.73133 


metallothionein 3 (growth inhibitory factor (neurotrophic)) 


3 


AA228357 


Hs.399939 


gb:nc39d05.rl NCI CGAP Pr2 Homo sapiens cDNA clone, mRNA sequence 


4 


AA130986 


Hs.271627 


ESTs 


4 


T64896 


Hs.406798 


Homo sapiens cDNA FIJI 1533 fis, clone HEMBA1002678 


4 


AA1 32637 


Hs.15396 


Homo sapiens, clone IMAGE: 3 94 8 90 9, mRNA, partial cds 


4 


AA3 17962 


Hs.249721 


ESTs, Moderately similar to PC4259 ferritin associated protein fH.sapiens] 
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4 


AW1 67439 


Hs.l 90651 


Homo sapiens cDNA FU13625 fis, clone PLACE1011032 


4 


AW452823 


Hs.135268 


ESTs 


4 


AA132255 


Hs.143951 


ESTs 


4 


D83782 


Hs.78442 


SREBP CLEAVAGE-ACTIVATING PROTEIN 


4 


AI690465 


Hs.201661 


ESTs, Weakly similar to JC5238 galactosylceramide-like protein, GCP [H.sapiensl 


4 


R07785 


Hs.429867 


ESTs 


4 


AL041465 


Hs. 182982 


golgin-67 


4 


AW1 83695 


Hs.370907 


ESTs 


4 


AW276914 


Hs.423341 


Homo sapiens clone IMAGE:713177, rnRNA sequence 


4 


U50535 


Hs.l 10630 


Human BRCA2 region, mRNA sequence CG006 


4 


AF073931 


Hs.122359 


calcium channel, voltage-dependent, alpha 1H subunit 


4 


AW341131 


Hs.146345 


ESTs 


4 


BE176694 


Hs.279860 


tumor protein, translationally-controlled 1 


4 


AW963118 


Hs.161784 


ESTs 


4 


AW513691 


Hs.270149 


ESTs, Weakly similar to 2109260A B cell growth factor [H.sapiens] 


4 


BE173380 


Hs.381903 


ESTs 


4 


Z29067 


Hs.2236 


NIMA (never in mitosis gene a)-related kinase 3 


4 


AA425310 


Hs.155766 


ESTs, Weakly similar to A47582 B-cell growth factor precursor [H.sapiens] 


4 


AW973253 


Hs.292689 


ESTs 


4 


AA453987 


Hs.144802 


ESTs 


4 


AA612710 


Hs.284148 


ESTs 


4 


AA830335 


Hs.l 05273 


ESTs 


4 


AW970859 


Hs.3 13503 


ESTs 


4 


AA532718 


Hs.178604 


ESTs 


4 


AI459519 


Hs.3 14437 


clone IMAGE:4607209, mRNA sequence [H.sapiens] 


4 


BE263901 


Hs.381222 


ESTs, Weakly similar to S37431 ankyrin 2, neuronal long splice form [H.sapiens] 


4 


AI301080 


Hs.35276 


KIAA0852 protein 


4 


AW975009 


Hs.292274 


ESTs, Weakly similar to A46010 X-linked retinopathy protein [H.sapiens] 


4 


AA677540 


Hs.l 17064 


ESTs 


4 


H74319 


Hs.188620 


ESTs 


4 


AI800041 


Hs.369733 


ESTs 


4 


AL360140 


Hs.l 76005 


Homo sapiens mRNA full length insert cDNA clone EUROIMAGE 113222 


4 


AF134160 


Hs.7327 


claudin 1 


4 


AI982794 


Hs.l 59473 


ESTs 


4 


AK001631 


Hs.8083 


hypothetical protein FU10769 


4 


W22152 


Hs.282929 


ESTs 


4 


H77824 


NA 


ESTs 


4 


AU076643 


Hs.313 


secreted phosphoprotein 1 (osteopontin, bone sialoprotein I, early T-lymphocyte activation 


4 


AW958124 


Hs.l 42442 


HP1-BP74 


4 


AL1 37714 


Hs.356298 


hypothetical protein LOC58481 


4 


AA001266 


Hs.133521 


ESTs 


4 


AL133100 


Hs.377705 


hypothetical protein FLJ20531 
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4 


AA001615 


Hs.84561 


ESTs 


4 


AA568515 


Hs.293510 


ESTs 


4 


AW079749 


Hs.184719 


ESTs Weaklv similar to ALU1 HUMAN ALU SUBFAMTT Y T SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


4 


AL045285 


Hs.277401 


bromodomain adjacent to zinc finger domain, 2A 


4 


AI740647 


Hs.141012 


ESTs, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


4 


AW976347 


Hs.76966 


ESTs 


4 


AI191811 


Hs.54629 


ESTs 


5 


NA 


NA 


Target Exon 


5 


NA 


NA 


Target Exon 


5 


NA 


NA 


P70091 ?Q*-ail^fi^RQ^7lcrhl A APIfi^OI 11 ( ACTIO, AW7\ crr\ cnnnrlin mnf m Jilrp* cirrvilar tr\ 

y^i\j\j£LZ.y .gij_>ujo^j / jgu|/\vrt.v_yj)oo\j i . 1 1 ^/w./Uufo / / ) aco-sponain-muciii'-iiKe, i>miiid.r io 
P98167 ( 


5 


AW883529 


Hs.173830 


ECT C W^Qlrl-vcimilar in AT T T7 T-TT TTV/f A "M AT TT <2T TRFAlV/fTT V CA QThOT TT3>jr , 'K 

CONTAMINATION WARNING ENTRY rH.sapiensl 


5 


AW969543 


Hs. 144609 


mitogen-activated protein kinase kinase kinase 13 


5 


AW854536 


NA 


gb:RC3-CT0255-200100-024-a08 CT0255 Homo sapiens cDNA, mRNA sequence 


5 


AA156657 


Hs.332383 


ESTs 


5 


N65993 


Hs.294003 


ESTs, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY TH-sapiens] 


5 


BE275835 


NA 


gb:601121639Fl NIH MGC _20 Homo sapiens cDNA clone 5\ mRNA sequence 


5 


H02480 


Hs.79592 


ESTs 


5 


AL038450 


Hs.48948 


ESTs 


5 


AA177088 


Hs. 190065 


ESTs 


5 


AA203569 


Hs.191482 


ESTs 


5 


AI253112 


Hs. 133540 


ESTs 


5 


T85105 


NA 


ESTs 


5 


AI972919 


Hs. 11 8837 


obscurin, cytoskeletal calmodulin and titin-interacting RhoGEF 


5 


AA304999 


Hs.27301 


ESTs, Weakly similar to similar to KIAA0855 [H.sapiens] 


5 


AA284447 


Hs.271887 


ESTs 


5 


AF1 82277 


Hs.330780 


cytochrome P450, subfamily IIB (phenobarbital-inducible), polypeptide 7 


5 


AI760018 


Hs.205071 


ESTs 


5 


R66740 


Hs.l 10613 


KIAA0220 protein 


5 


BE296394 


NA 


gb:601 176734F1 NIH MGC 17 Homo sapiens cDNA clone 5', mRNA sequence 


5 


AW960454 


NA 


ESTs 


5 


H57111 


Hs.221132 


ESTs 


5 


R42755 


Hs.23096 


ESTs 


5 


AA367069 


Hs.l 00636 


ESTs 


5 


AL049987 


Hs.l 66361 


Homo sapiens mRNA; cDNA DKFZp564Fl 12 (from clone DKFZp564F112) 


5 


AI767152 


Hs.181400 


ESTs, Weakly similar to 178885 serine/threonine-specific protein kinase [H.sapiensl 


5 


AW971063 


Hs.292882 


ESTs 


5 


AI494291 


Hs.369171 


ESTs 


5 


AI734110 


Hs.l 36355 


ESTs 


5 


AI123657 


Hs.l 69755 


ESTs, Weakly similar to JC5314 CDC28/cdc2-like kinase associating arginine-serine 
cyclophilin [H.sapiens] 


5 


AA488953 


NA 


gb:aa55e05.rl NCI_CGAP_GCB1 Homo sapiens cDNA clone 5\ mRNA sequence 


5 


AW295859 


Hs.235860 


ESTs 



44 



WO 2004/090547 



PCT/US2004/010465 



5 


AA806538 


Hs. 130732 


KIAA1575 protein 


5 


AL040360 


Hs. 162203 


ESTs, Weakly similar to alternatively spliced product using exon 13A [H.sapiensl 


5 


N38913 


Hs.221575 


ESTs 


5 


AW971983 


Hs.293003 


cation channel, sperm associated 2 (CATSPER2), transcript variant 1, mRNA. 


5 


AI343966 


Hs. 158528 


ESTs 


5 


AW136134 


Hs.220277 


ESTs 


5 


AW450922 


Hs. 112478 


ESTs 


5 


AA609738 


Hs. 16525 


ESTs 


5 


AA6 13792 


NA 


gb:no97h03.sl NCI CGAP Pr2 Homo sapiens cDNA clone, mRNA sequence 


5 


AI631749 


Hs.156616 


ESTs, Weakly similar to alternatively spliced product using exon 13A [H.sapiens] 


5 


H56995 


Hs.37372 


Homo sapiens DNA binding peptide mRNA, partial cds 


5 


AI624436 


Hs.3 10286 


ESTs 


5 


AW374941 


Hs.87409 


ESTs 


5 


AW974957 


Hs.288719 


Homo sapiens cDNA FLJ12142 Us, clone MAMMA1 000356 


5 


AA737345 


Hs.294041 


ESTs 


5 


AA888311 


Hs. 17602 


Homo sapiens cDNA FLJ12381 lis, clone MAMMA 1 002566 


5 


AW295687 


Hs.254420 


ESTs 


5 


AA757900 


Hs.270823 


ESTs, Weakly similar to S65657 alpha- lC-adrenergic receptor splice form 2 [H.sapiensl 


5 


AI916685 


Hs.371850 


ESTs 


5 


BE273296 


Hs.3069 


Homo sapiens cDNA FU13255 fis, clone OVAJR.C 1000800, moderately similar to 
MITOCHONDRIAL STRESS-70 PROTEIN PRECURSOR 


5 


AA808948 


Hs.378776 


ESTs, Moderately similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


5 


BE046594 


NA 


gb:hn41cl l.xl NCI CGAP RDF2 Homo sapiens cDNA clone 3', mRNA sequence 


5 


AI277986 


Hs. 164875 


ESTs 


5 


AA830144 


Hs.135613 


ESTs, Moderately similar to 138022 hypothetical protein [H.sapiensl 


5 


BE159253 


Hs.300638 


ESTs 


5 


BE561880 


NA 


gb:601346073Fl NIH MGC 8 Homo sapiens cDNA clone 5', mRNA sequence 


5 


AI565071 


Hs.369984 


ESTs 


5 


AI184717 


Hs.372653 


ESTs 


5 


AI052572 


NA 


ESTs, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


5 


AI056776 


Hs.133397 


ESTs, Weakly similar to 178885 serine/threonine-specific protein kinase [H.sapiens] 


5 


AI123195 


Hs.47783 


gb:ool7al0.xl Soares_NSF_F8_9W_OT_PAJP_Sl Homo sapiens cDNA clone 3 1 similar 
to IR:Q16673 Q 16673 PMS7 MRNA ;contains OFRtl OFR repetitive element ;, mRNA 
sequence 


5 


AI565004 


Hs.374415 


cathepsin D (lysosomal aspartyl protease) 


5 


AI858635 


Hs.144763 


ESTs 


5 , 


AL049951 


Hs.22370 


Homo sapiens mRNA; cDNA DKFZp564O0122 (from clone DKFZp564O0122) 


5 


AI880843 


Hs.370296 


ESTs 


5 


AI653006 


Hs. 195374 


ESTs 


5 


AI990790 


Hs.188614 


ESTs 


5 


AA004681 


Hs.59432 


ESTs 


5 


AA004906 


Hs.404424 


ESTs 


5 


AI826999 


Hs.224624 


ESTs 


5 


AA737314 


Hs. 194324 


hypothetical protein FLJ12634 
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5 


AA011616 


NA 


ESTs 


5 


AW504178 


Hs.222731 


ESTs, Weakly similar to 138022 hypothetical protein TH.sapiensl 


5 


AB032995 


Hs.26440 


two-pore channel 1, homolog 


5 


AA454220 


Hs.61170 


ESTs 


5 


AI9 14925 


Hs.222240 


ESTs 


5 


BE066058 


Hs.269233 


ESTs. Moderatelv similar to T78R8S Qprinp/fhrprvninp-snppifip -nrntpin lHnac»a ru eor»i^>nc1 


5 


H62793 


Hs 268945 


ESTs 


5 


AW295097 


Hs 200260 


ESTs 


6 


AA075144 


Hs.401448 


gb:zm86f06.sl Stratagene ovarian cancer (937219) Homo sapiens cDNA clone 
IMAGE:544835 3* similar to gb:X 16064 TRANSLATIONALLY CONTROLLED 
TUMOR PROTEIN (HUMAN);, mRNA sequence. 


6 


AI539227 


Hs.214039 


hypothetical protein FLJ23556 


6 


AA031576 


Hs. 143 8 12 


Homo sapiens cDNA FU12956 fis, clone NT2RP2005501 


6 


AF045458 


Hs.47061 


unc-51 (C. elegans)-like kinase 1 


6 


AW631439 


NA 


Homo sapiens cDNA FLJ1 1582 fis, clone HEMBA1003656 


6 


NM 014760 


Hs.75863 


KIAA02 1 8 gene product 


6 


C14904 


Hs.45184 


Homo sapiens cDNA FLJ12284 fis, clone MAMMA1 001757 


6 


AA148984 


Hs.48849 


ESTs, Weakly similar to ALU4 HUMAN ALU SUBFAMILY SB2 SEQUENCE 
CONTAMINATION WARNING ENTRY [H. sapiens] 


6 


AW602463 


Hs.233370 


ESTs 


6 


X78342 


Hs.77313 


cyclin-dependent kinase (CDC2-like) 10 


6 


R12228 


NA 


ESTs 


6 


T61572 


Hs.79385 


Human clone 23574 mRNA sequence 


6 


AB020671 


Hs.84883 


KIAA0864 protein 


6 


AA236282 


Hs. 1723 18 


ESTs 


6 


AA323486 


Hs.325530 


Homo sapiens cDNA FLJ12335 fis, clone MAMMA1002219, highly similar to Rattus 
norvegicus rexo70 mRNA 


6 


BE247348 


Hs. 155499 


golgi-specific brefeldin A resistance factor 1 


6 


R05327 


Hs. 189726 


ESTs 


6 


T19228 


Hs.172572 


hypothetical protein FLJ20093 


6 


AW979298 


Hs.292896 


ESTs 


6 


AW812795 


Hs.337534 


ESTs, Moderately similar to 138022 hypothetical protein [H.sapiens] 


6 


AA4891 66 


Hs. 156933 


ESTs 


6 


BE218886 


Hs.282070 


ESTs 


6 


AP043244 


Hs.278439 


nucleolar protein 3 (apoptosis repressor with CARD domain) 


6 


AI076345 


Hs.373742 


ESTs 


6 


BE552155 


Hs.294035 


ESTs, Weakly similar to ALU5 HUMAN ALU SUBFAMILY SC SEQUENCE 
CONTAMINATION WARNING ENTRY rH.sapiensl 


6 


AW847208 


Hs.406201 


BANP homolog, SMAR1 homolog 


6 


AA834082 


Hs.307559 


ESTs 


6 


AF1 19847 


Hs.383393 


Homo sapiens PRO 1550 mRNA, partial cds 


6 


AW352170 


Hs.129086 


Homo sapiens cDNA FLJ12007 fis, clone HEMBB1001588 


6 


All 89587 


Hs.120915 


ESTs 


6 


AA677934 


Hs.l 17864 


ESTs 


6 


AA700946 


Hs.368238 


ESTs 


6 


AI684710 


Hs.111611 


ribosomal protein L27 
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6 


AW022213 


Hs.370487 


ESTs 


6 


AA580691 


Hs. 180789 


SI 64 protein 


6 


AW975663 


Hs.293404 


ESTs, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


6 


AW369770 


Hs.130351 


ESTs 


6 


AI3 80429 


Hs. 172445 


ESTs 


6 


AA356599 


Hs.173904 


ESTs 


6 


BE560954 


NA 


gb:601347719Fl NIH MGC 8 Homo sapiens cDNA clone 5\ mRNA sequence 


6 


AL040215 


Hs.7278 


crypto chrome 2 (photolyase-like) 


6 


AI376551 


Hs.368882 


gb:te64el0.xl Soares NFL T GBC SI Homo sapiens cDNA clone 3 ! , mRNA sequence 


6 


AI247472 


Hs.132965 


ESTs 


6 


AL038823 


Hs.12840 


Homo sapiens germline mRNA sequence 


6 


AW450103 


Hs.151124 


ESTs 


6 


AK001579 


Hs.25277 


hypothetical protein FLJ21065 


6 


W80462 


NA 


ESTs, Highly similar to ALU2 HUMAN ALU SUBFAMILY SB SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


6 


AA037675 


Hs.152675 


ESTs 


6 


N72794 


Hs.37716 


hypothetical protein MGC39320 


6 


AI653672 


Hs.377610 


PNAS-123 


6 


BE091833 


NA 


gb:IL2-BT073 1-260400-076-F04 BT0731 Homo sapiens cDNA, mRNA sequence 


6 


AA854133 


Hs.3 10462 


ESTs 


7 


AW5 11255 


NA 


ESTs 


7 


AW1 82924 


Hs. 128790 


ESTs 


7 


AW197644 


Hs.19107 


ESTs 


7 


AA215404 


Hs.355588 


ESTs 


7 


T82331 


Hs.31314 


calmodulin 2 (phosphorylase kinase, delta) 


7 


AI634046 


Hs. 195 175 


CASP8 and FADD-like apoptosis regulator 


7 


AA421020 


Hs.208919 


ESTs 


7 


AI932995 


Hs.183475 


Homo sapiens clone 25061 mRNA sequence 


7 


AA579297 


Hs.26937 


brain and nasopharyngeal carcinoma susceptibility protein 


7 


AA831815 


Hs.370756 


ESTs, Weakly similar to 178885 serine/threonine-specific protein kinase [H.sapiensl 


7 


AI732132 


Hs. 109426 


ESTs 


7 


T85301 


Hs.88974 


gb:yd78d06.sl Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone 3* similar to 
contains Alu repetitive element;, mRNA sequence 


7 


AI076259 


Hs.371556 


ESTs 


7 


AW979249 


NA 


gb:EST391359 MAGE resequences, MAGP Homo sapiens cDNA, mRNA sequence 


7 


AW298359 


Hs.221069 


ESTs 


7 


Z48633 


Hs.283742 


H. sapiens mRNA for retrotransposon 


7 


T92576 


Hs.191168 


ESTs 


7 


AI638706 


Hs.405567 


ESTs, Weakly similar to A47582 B-cell growth factor precursor [H.sapiensl 


7 


BE158006 


Hs.212296 


ESTs 


7 


AF009267 


Hs.102238 


Homo sapiens clone FBA1 Cri-du-chat region mRNA 


8 


NM 030929.2 


NA 


NM_030929.2| Homo sapiens hypothetical protein FKSG28 (FKSG28), mRNA 


8 


NA 


NA 


Target Exon 


8 


AI307226 


Hs. 164421 


ESTs 
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8 


AA135159 


Hs.203349 


Homo sapiens cDNA FLJ12149 fis, clone MAMMA1000421 


8 


AI277367 


Hs.47094 


ESTs 


8 


BE1 69995 


Hs.l 80799 


hypothetical protein FLJ22561 


8 


AW958181 


Hs. 189998 


ESTs 


8 


R08950 


Hs.272044 


ESTs, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


8 


N58885 


Hs.289061 


f?lvvvfi0a09 si Snares rnnltinle sclerosis 2T^JhHTVISP TTnmn snnipn^ rTYNJA rlnnp V 
mRNA sequence 


8 


AA215539 


Hs.283643 


Homo sapiens cDNA FLJ11606 fis, clone HEMBA1003942 


8 


AA215701 


Hs.l 86541 


ESTs, Weakly similar to 138022 hypothetical protein [H.sapiensl 


8 


AA3 15703 


Hs.199993 


ESTs, Weakly similar to ALUB_HUMAN ! ! ! ! ALU CLASS B WARNING ENTRY ! ! ! 
TH-sapiensl 


8 


AW936874 


NA 


gb:RCl-DT0029-120100-01 l-f07 DT0029 Homo sapiens cDNA, mRNA sequence 


8 


H84455 


Hs.40639 


ESTs 


8 


BE549205 


Hs.l 84488 


flotillin 2 


8 


AA971576 


Hs.225951 


topoisomerase-related function protein 4-1 


8 


AW276866 


Hs.192715 


ESTs 


8 


AL047879 


Hs.293865 


ESTs, Weakly similar to ALU2_HUMAN ALU SUBFAMILY SB SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiens] 


8 


AA657494 


NA 


gb:nt66f04.sl NCI_CGAP_Pr3 Homo sapiens cDNA clone similar to gb:M35663 
INTERFERON-INDUCED DOUBLE-STRANDED RNA-ACTTVATED PROTEIN 
KINASE (HUMAN);, mRNA sequence 


8 


AA699325 


Hs.269880 


ESTs 


8 


AW510927 


Hs.371883 


ESTs 


8 


AU077018 


Hs.3235 


keratin 4 


8 


AA761490 


Hs.351250 


ESTs, Moderately similar to S65657 alpha- 1 C-adrenergic receptor splice form 2 
[H.sapiens] 


8 


AW979008 


Hs.30738 


hypothetical protein FIJI 0407 


8 


AL045620 


Hs.131021 


hypothetical protein DKFZp434Gl 1 8 


8 


AW450681 


Hs.224941 


ESTs 


8 


N71597 


Hs.29698 


ESTs, Weakly similar to ZN91 HUMAN ZINC FINGER PROTEIN 91 [H.sapiensl 


8 


U54727 


Hs.191445 


ESTs 


8 


AW891965 


Hs.367942 


histone deacetylase 3 


9 


NA 


NA 


C6001282:gi|4504223|ref|NP 000172. 1| glucuronidase, beta [Homo sapiens] 
gi[114963|sp|P082 


9 


NM 138295.1 


NA 


NM_138295.1| Homo sapiens polycystic kidney disease 1 like 1 (PKD1L1), mRNA 


9 


XI 5673 


NA 


gb: Human pTR2 mRNA for repetitive sequence. 


9 


AA031663 


Hs.28802 


centaurin-alpha 2 protein 


9 


AW971350 


Hs.63386 


ESTs 


9 


AW085690 


Hs.63428 


ESTs, Weakly similar to Z195 HUMAN ZINC FINGER PROTEIN 195 [H.sapiensl 


9 


AA079229 


NA 


gb:zm95f04.rl Stratagene colon HT29 (937221) Homo sapiens cDNA clone 5* similar to 
gb:J03626 URIDINE S'-MONOPHOSPHATE SYNTHASE (HUMAN);, mRNA sequence 


9 


AA205850 


Hs.122823 


thousand and one amino acid protein kinase 


9 


BE1 52644 


NA 


gb:CMl-HT0329-250200-128-f09 HT0329 Homo sapiens cDNA, mRNA sequence 


9 


AA3 11223 


Hs.283091 


found in inflammatory zone 3 


9 


AI052628 


Hs.271570 


ESTs, Weakly similar to 2109260A B cell growth factor [H.sapiensl 


9 


AA1 92455 


Hs.22968 


Homo sapiens clone IMAGE:451939, mRNA sequence 


9 


R59096 


Hs.279939 


mitochondrial carrier homolog 1 


9 


U38847 


Hs.151518 


TAR (HIV) RNA-binding protein 1 
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9 


AW938336 


Hs. 193767 


ESTs 


9 


AI343641 


Hs. 185798 


ESTs 


9 


AB007867 


Hs.278311 


plexin Bl 


9 


N52821 


Hs.269412 


ESTs, Moderately similar to ALU7 HUMAN ALU SUBFAMILY SQ SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiens] 


9 


AW972689 


Hs.200934 


ESTs 


9 


AA533447 


Hs.169610 


CD44 antigen (homing function and Indian blood group system) 


9 


AI056872 


Hs.133386 


ESTs 


9 


AA909619 


Hs.l 12668 


ESTs 


9 


AA736872 


Hs.371634 


ESTs 


9 


R97804 


Hs.l 8723 


ESTs 


9 


AA699991 


Hs.375200 


gb:zi69a09.sl Soares_fetalJiver_spleen_lNFLS_Sl Homo sapiens cDNA clone 3' similar 
to contains Alu repetitive element;, mRNA sequence 


9 


AI248285 


Hs.l 18348 


ESTs 


9 


AI640635 


Hs.l 16468 


EST 


9 


BE177778 


Hs.378703 


gb:RCl-HT0598-310300-012-f07 HT0598 Homo sapiens cDNA, mRNA sequence 


9 


AA897108 


NA 


gb:am08a06.sl Scares NFL T GBC SI Homo sapiens cDNA clone 3', mRNA sequence 


9 


BE327015 


Hs.81988 


disabled homolog 2, mitogen-responsivephosphoprotein (Drosophila) (DAB2), mRNA. 


9 


AI125436 


Hs.405924 


ESTs 


9 


BE562611 


Hs.348711 


gb:601336446Fl NIH_MGC_44 Homo sapiens cDNA clone 5', mRNA sequence 


9 


AI084182 


Hs.370293 


Homo sapiens cDNA FLJ14209 fis, clone NT2RP3003346 


9 


AB037731 


Hs.7871:65 


hypothetical protein FIJI 0081 


9 


AI2221 65 


Hs. 144923 


ESTs 


9 


AV654627 


Hs.271808 


ESTs, Weakly similar to 138022 hypothetical protein [H.sapiens] 


9 


AW297283 


Hs.192819 


ESTs 


9 


AI762475 


Hs.151327 


ESTs, Moderately similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


9 


AF263462 


Hs.l 8376 


KIAA1 31 9 protein 


9 


AI493546 


Hs.l 94737 


KIAA0453 protein 


9 


BE3 95253 


Hs.30861 


hypothetical protein MGC29956 (MGC29956), mRNA. 


9 


AW450536 


Hs.209260 


ESTs 


9 


R35917 


Hs.301338 


hypothetical protein FIJI 2587 


9 


AA748418 


Hs.33368 


hypothetical protein FUl 1 175 


9 


AA086123 


Hs.3 17177 


ESTs 


9 


AA721140 


NA 


ESTs, Weakly similar to putative pi 50 [H.sapiens] 


9 


AW892049 


NA 


gb :RC5-NT003 5-2 6 0400-02 1-D1 1 NT0035 Homo sapiens cDNA, mRNA sequence 


9 


AI279811 


Hs.298553 


Homo sapiens, clone IMAGE: 3 953 631, mRNA, partial cds 


9 


BE1 60204 


Hs.390799 


gb:QVl-HT0413-010200-059-g08 HT0413 Homo sapiens cDNA, mRNA sequence 


10 


NM 005936 


NA 


NM_005936:Homo sapiens myeloid/lymphoid or mixed-lineage leukemia (trithorax 
(Drosophila) homolog); translocated to, 4 (MLLT4), mRNA. 


10 


AA508857 


Hs.369326 


ESTs, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiens] 


10 


AA724738 


Hs.131034 


ESTs, Weakly similar to 178885 serine/threonine-specific protein kinase [H.sapiens] 


10 


AA1 30992 


Hs.2794 


gb:zol5e02.sl Stratagene colon (937204) Homo sapiens cDNA clone 3' similar to 
contains Alu repetitive element;contains element PTR5 repetitive element ;, mRNA 
sequence 


10 


AA1 60363 


Hs.269956 


ESTs 


10 


H69480 


Hs.141304 


ESTs 
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10 


AI080042 


Hs.377298 


ribosomal protein S24 


10 


BE549343 


Hs.82208 


acyl-Coenzyme A dehydrogenase, very long chain 


10 


AW967054 


Hs.206312 


ESTs, Weakly similar to 138022 hypothetical protein [H.sapiens] 


10 


AI821614 


Hs.87409 


ESTs 


10 


AA811933 


Hs.104234 


ESTs 


10 


AK000753 


Hs.92374 


hypothetical protein 


10 


AA811657 


Hs.220913 


ESTs 


10 


AI199510 


Hs.267912 


P<5Tc WahVK/ similar- tr» AT TT7 T-TT TVT A "M AT TT <5T TOTT A A/TTT V CH CTjrVT TTTM/^P 

Ejib, wcciKiy similar to /vi_<u/ nuivi/viN o u rs i h /vivi i oxiViwriiNtw>ii 
CONTAMINATION WARNING ENTRY [H.sapiens] 


10 


AW469240 


NA 


ESTs 


10 


AW970512 


NA 


gb:EST382593 MAGE resequences, MAGK Homo sapiens cDNA, mRNA sequence 


10 


AW057782 


Hs.293053 


ESTs 


10 


AI868634 


Hs.246358 


cms, weaKiy similar to idzzdu nypotnencai protein l loti 1.5 - caenornauditis elegans 
[C.elegans] 


10 


BE300073 


Hs.279860 


tumor protein, translationally-controlled 1 


10 


AA641201 


Hs.222051 


ESTs 


10 


ALII 8754 


NA 


gb:DKFZp761P1910_rl 761 (synonym: hamy2) Homo sapiens cDNA clone 
DKFZp761P1910 5\ mRNA sequence 


10 


BE503432 


Hs.284153 


Fanconi anemia, complementation group A 


10 


AB002375 


Hs.156814 


KIAA0377 gene product 


10 


AA632817 


Hs.190316 


ESTs 


10 


AA372796 


NA 


ESTs, Weakly similar to AF161356 1 HSPC093 [H.sapiensl 


10 


AK001016 


Hs.356519 


hypothetical protein FLJ10154 


10 


AI553741 


Hs.98791 


ESTs 


10 


AW369620 


Hs.33944 


RQTc WppI'-Ii/ pimilsir tr» ATTT1 T-TT TA/T A XT A T T T QT TRP Aft/TTT V T QFPtT T'PKT(~'R 

CONTAMINATION WARNING ENTRY [H.sapiens] 


10 


AA459316 


Hs.99743 


ESTs 


10 


AW967807 


Hs. 13797 


ESTs 


10 


AW972227 


Hs. 163986 


Homo sapiens cDNA: FLJ227 65 fis, clone KAIA1180 


10 


AW972771 


Hs.292471 


CCT,, T vI/V>oM-7 c^^iil^t- +n ATTT1 TUT Th/f A XT A T T T QT IDC A Ti/fTT V T CCHT TCXTf" 1 !? 

tiais, weaiciy similar to alui riuiviAJN /\jlu 0 utsr aiviil y j oii^Ucrsiuii 
CONTAMINATION WARNING ENTRY [H.sapiens] 


10 


AI131140 


Hs.372186 


ESTs 


10 


AA570710 


Hs.349344 


hypothetical protein BC001573 


10 


AA832055 


NA 


ESTs, Weakly similar to ALUI HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiens] 


10 


AA604405 


NA 


gb:no87h09.sl NCI CGAP AA1 Homo sapiens cDNA clone 3', mRNA sequence 


10 


AI174777 


Hs.400372 


Homo sapiens PR02492 mRNA, complete cds 


10 


AI611172 


Hs. 189578 


ESTs 


10 


AA460479 


Hs.321707 


KIAA0742 protein 


10 


AI378570 


Hs.l 16397 


ESTs 


10 


AA648983 


Hs.370514 


ESTs 


10 


AI285970 


Hs.183817 


ESTs 


10 


AW015736 


Hs.2 11378 


ESTs 


10 


T97301 


Hs.l 8026 


ESTs 


10 


BE301871 


Hs.4867 


mannosyl (alpha-l,3-)-glycoprotein beta-l,4-N-acetylglucosaminyltransferase, isoenzyme 
B 


10 


AW021655 


Hs.l 94441 


ESTs 


10 


AF220263 


Hs.193920 


MOST2 protein 
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10 


W90446 


Hs. 137324 


ESTs 


10 


AI418466 


Hs.33665 


ESTs 


10 


AA704899 


Hs.291651 


ESTs, Weakly similar to 138022 hypothetical protein [H.sapiensl 


10 


AI433540 


Hs.405182 


gb:ti69g05.xl NCI CGAP Kidl 1 Homo sapiens cDNA clone 3', mRNA sequence 


10 


R55822 


Hs.4268 


ESTs 


10 


AA810788 


Hs.123337 


ESTs 


10 


AI660898 


Hs.l 19533 


ESTs 


10 


AL138461 


Hs.323084 


tRNA-guanine transglycosylase 


10 


AI570700 


Hs.128025 


ESTs 


10 


BE244622 


Hs.8084 


hypothetical protein dJ465N24.2.1 


10 


AA983913 


Hs.368672 


ESTs 


10 


AA355525 


Hs.l 59604 


cysteinyl-tRNA synthetase 


10 


AI025499 


Hs370408 


ESTs 


10 


AI280341 


Hs.l 66571 


ESTs 


10 


AV651680 


Hs.208558 


ESTs 


10 


AI674383 


Hs.22891 


solute carrier family 7 (cationic amino acid transporter, y system), member 8 


10 


R07355 


Hs.l 5464 


Homo sapiens cDNA: FLJ21351 fis, clone COL02762 


10 


AI733819 


Hs.145557 


ESTs 


10 


AL1 37730 


Hs.14235 


hypothetical protein FLT20008; KIAA1 83 9 protein 


10 


AW205632 


Hs.211198 


ESTs 


10 


AI962234 


Hs.196102 


ESTs 


10 


AI651803 


Hs.370331 


ESTs 


10 


R94570 


Hs.266869 


ESTs, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY fH.sapiensl 


10 


AI540842 


Hs.61082 


ESTs 


10 


AW83S616 


Hs.372534 


gb:RC5-LT0054-140200-013-D01 LT0054 Homo sapiens cDNA, mRNA sequence 


11 


NA 


NA 


Target Exon 


11 


AA045899 


Hs.146170 


hypothetical protein FLJ22969 


11 


T82427 


Hs.l 941 01 


Homo sapiens cDNA: FU20869 fis, clone ADKA02377 


11 


AU077343 


Hs.43910 


CD164 antigen, sialomucin 


11 


AW206670 


Hs.50748 


chromosome 21 open reading frame 18 


11 


AA525225 


Hs.334630 


Homo sapiens cDNA FLJ 14462 fis, clone MAMMA1000241 


11 


BE181659 


NA 


gb:QVl-HT0638-070500-191-g07 HT0638 Homo sapiens cDNA, mRNA sequence 


11 


BE327036 


Hs.172813 


Rhn Plianitlft TUinlpfitfflp PvrTianorp fhr»tr»r ATPTA 7 f A"RT-Tf"TpT?'7 , \ Irnncrrint variant 1 
iajiu 0 uaiiiiic nui/icuuut CAi/iiaugc IcUsLUl ^UCf J I yJr\.lSSi\J LlsF 1 ) 9 II ailad ipi Variant I, 

mRNA. 


11 


AF022375 


Hs.73793 


vascular endothelial growth factor 


11 


AA456195 


Hs.l 0056 


hypothetical protein FLJ14621 


11 


N92571 


Hs.54808 


ESTs 


11 


LI 9067 


Hs.75569 


v ici avian idi^unjcuuuuiciiublo vllal UIlL-UgciiC IlUIIUJlOg x\ ^nui/lCar IclClUI OI Kappa llgni 

polypeptide gene enhancer in B-cells 3 (p65)) 


11 


AW938668 


NA 


gb:PMl-DT0063-160200-003-c07 DT0063 Homo sapiens cDNA, mRNA sequence 


11 


AW452420 


Hs.248678 


ESTs 


11 


T77127 


Hs.375694 


gb:yd72a05.rl Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone 5', mRNA 
sequence 


11 


R94977 


Hs.35416 


PRO0132 protein 


11 


AA229781 


Hs.336812 


ESTs 
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AJ224901 


Hs. 109526 


zinc finger protein 1 98 




AA016188 


Hs.l 11244 


hypothetical protein 




AV647015 


Hs.349256 


paired immunoglobulin-like receptor beta 




NM 004428 


Hs.l 624 


ephrin-Al 


n 


BE244625 


Hs.125742 


leucine-rich neuronal protein 




AA505691 


Hs.145696 


splicing factor (CC1.3) 


n 


AA469042 


Hs.164410 


chromosome 1 6 open reading frame 7 




AA494172 


Hs.194417 


ESTs 


n 


BE397531 


Hs.l 82237 


POU domain, class 2, transcription factor 1 


n 


AW969656 


NA 


gb:EST381733 MAGE resequences, MAGKHomo sapiens cDNA, mRNA sequence 




AL023754 


Hs.l 99068 


similar to calcium/calmodulin dependent protein kinases 


n 


AW793022 


Hs.323463 


hypothetical protein 


n 


AA487264 


Hs.l 54974 


Homo sapiens mRNA; cDNA DKFZp667N064 (from clone DKFZp667N064) 


11 


AI874223 


Hs.293560 


ESTs ! 


11 


AA761378 


Hs.192013 


ESTs 


» 


AK000777 


Hs.272197 


Homo sapiens cDNA FLJ20770 fis, clone COL06509 




R31178 


Hs.287820 


fibronectin 1 


11 


AL043683 


Hs.8173 


hypothetical protein FIJI 0803 [ 


ii 


BE242758 


Hs.l 90223 


FST<; Mnderatflv similar tr» T7Q7RS livnnthptiral nrntein P^ztTVL 14 - PapnorVmhrHtf*; 

elegans [C.elegans] 


a 


AI674779 


Hs.l 26744 


ESTs 


n 


AA586950 


Hs.373755 


Homo sapiens mRNA; cDNA DKFZp761Gl 8121 (from clone DKFZp761Gl 8121); ; 
complete cds 


u 


AW273261 


Hs.216292 


ESTs : 


n 


BE005398 


Hs.375092 


gb:CMl-BN01 16-1 50400-1 89-h02 BN01 16 Homo sapiens cDNA, mRNA sequence 


u 


T51910 


Hs.9333 


ESTs 


n 


AL042425 


Hs.283976 


hypthetical protein PR02389 ! 


n 


AW975684 


Hs.294014 


ESTs 


n 


AA745618 


Hs.l 10613 


BANP homolog, SMAR1 homolog 


n 


AA279341 


Hs.174151 


aldehyde oxidase 1 


n 


AW753588 


Hs.86998 


Homo sapiens cDNA FLJ10205 fls, clone HEMBA1004954 


n 


AI954880 


Hs.372464 


ESTs 


n 


AW609170 


Hs.398050 


ESTs 


n 


AI420611 


Hs.l 53934 


core-binding factor, runt domain, alpha subunit 2; translocated to, 2 




AI887875 


Hs.307434 


ESTs 


11 


HI 5560 


Hs.131833 


ESTs 


n 


AI038316 


Hs.156317 


alvnv4.RrO& v1 ^narpc tntnl -pptnc ATK9 WRS Q\\.r Wnmn e^nipne r>l~YMA r-lnnp 'V mRM A 
gu.UAtotuO.Al OUctlCb LUUtl JLClUo lNDZxira !?W xlUITlU bdpiclli> CJLTN.rt U1UIIC j , ITlIviNrv 

sequence 




T47764 


Hs.132917 


ESTs 




R69077 


Hs.193348 


ESTs, Moderately similar to 178885 serine/threonine-specific protein kinase [H.sapiensl 




AI073491 


Hs.269887 


ESTs, Highly similar to KPBB HUMAN PHOSPHORYLASE B KINASE BETA 
REGULATORY CHAIN rH.sapiens] 




R44284 


Hs.2730 


heterogeneous nuclear ribonucleoprotein L 




AW594695 


Hs.l 67046 


ESTs 




AI679753 


Hs.371392 


ESTs, Weakly similar to ALU7 HUMAN ALU SUBFAMILY SQ SEQUENCE 
CONTAMINATION WARNING ENTRY rH.sapiens] 
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11 


H22953 


Hs.137551 


ESTs 


11 


BE546846 


Hs.195048 


ESTs 


11 


AA010200 


Hs.175551 


ESTs 


11 


T98171 


Hs. 185675 


ESTs 


11 


AA046457 


Hs.60677 


ESTs 


11 


AW102941 


Hs.211265 


ESTs 


11 


AA025386 


Hs.613 11:24 


ESTs, Weakly similar to SI 0590 cysteine proteinase [H.sapiens] 


11 


AF044924 


Hs.30792 


hook2 protein 


11 


R41874 


Hs.22164 


AD038 


11 


AI978583 


Hs.329273 


ESTs, Weakly similar to 178885 serine/threonine-specific protein kinase fH.sapiens] 


11 


BE620712 


Hs.33026 


hypothetical protein PP2447 


11 


AW362901 


Hs.68864 


lipase, member H (LIPH), mRNA. 


11 


AI905216 


NA 


gb:ROBT078-260499-024 BT078 Homo sapiens cDNA, mRNA sequence 


11 


AA889982 


Hs.271826 


ESTs, Weakly similar to 138022 hypothetical protein [H.sapiens] 


11 


AA320038 


NA 


gb:EST22383 Adipose tissue, white II Homo sapiens cDNA 5' end, rnRNA sequence 


12 


M22333 


NA 


Target Exon 


12 


H90988 


Hs.334503 


hypothetical protein MGC12386 


12 


AA1 94952 


Hs.36093 


Homo sapiens cDNA FLJ12885 fis, clone NT2RP2003988 


12 


AI860558 


Hs.62112 


zinc ringer protein 207 


12 


AA378739 


Hs. 1877 11 


ESTs 


12 


AW511443 


Hs.258110 


ESTs 


12 


AF075113 


Hs.384696 


gb:Homo sapiens full length insert cDNA YU78B07 


12 


AI357813 


Hs.239926 


sterol-C4-methyl oxidase-like 


1 0 
1Z 




Jris. 1 J'tozz 




1 0 




ij c m 1 oo 


ESTs 


12 


AI827988 


Hs.240728 


ESTs, Moderately similar to PC4259 ferritin associated protein [H.sapiens] 


12 


AW340925 


Hs.l 10855 


ESTs 


12 


N72596 


NA 


gb:za46ft)4.sl Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone 3* similar to 
;, mRNA sequence 


13 


AI125507 


Hs.130829 


transformer-2 alpha (htra-2 alpha) 


13 


AA534222 


NA 


olrm'71 HO 9 cl T\T{~*T (^T-rAP AA1 Hnmn Qn-nifnc; rnfWA r>1nnp V dimilaT" tn contaiti*? A1u 

repetitive element;, mRNA sequence 


13 


AW976511 


Hs.l 12592 


ESTs 


14 


AI801 565 


Hs.200113 


Homo sapiens cDNA FLJ11379 fis, clone HEMBA1000469 


14 


H13016 


Hs.l 98281 


pyruvate kinase, muscle 


14 


AA521132 


Hs.48576 


excision repair cross-complementing rodent repair deficiency, complementation group 5 
(xeroderma pigmentosum, complementation group G (Cockayne syndrome)) 


14 


BE259015 


Hs.74576 


GDP dissociation inhibitor 1 


14 


AI912061 


Hs.55016 


hypothetical protein FLJ21935 


14 


AA093428 


Hs.352337 


ESTs 


14 


H70814 


Hs.2336S 


Homo sapiens clone FLC0578 PR02852 mRNA, complete cds 


14 


AA1 97305 


Hs.123075 


ESTs, Weakly similar to A46010 X-linked retinopathy protein fH.sapiensl 


14 


H77859 


Hs.377218 


reticulon 4 


14 


AW449855 


Hs.96557 


Homo sapiens cDNA FU12727 fis, clone NT2RP2000027 
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14 


AI922821 


Hs.32433 


ESTs 


14 


BE281303 


Hs.299148 


hypothetical protein FLJ21801 


14 


H82114 


Hs.74170 


ESTs 


14 


AI149880 


Hs. 188809 


ESTs 


14 


AF1 69255 


Hs.241377 


5-hydroxytryptamine (serotonin) receptor 3B 


14 


AI584156 


Hs. 105640 


Homo sapiens, clone IMAGE:4139775, mRNA, partial cds 


14 


NM 013937 


Hs.247861 


olfactory receptor, family 1 1 , subfamily A, member 1 


14 


AW023610 


Hs.370582 


ESTs 


14 


AA5 16420 


Hs.352340 


ESTs, Weakly similar to 138022 hypothetical protein [H.sapiens] 


14 


NM 014159 


Hs.6947 


HSPC069 protein 


14 


AI658666 


Hs.352381 


RNA binding motif protein 4 


14 


AA551569 


Hs.272034 


hypothetical protein PR02822 


14 


AA700439 


Hs. 188490 


ESTs 


14 


BE326856 


Hs. 11 8795 


hypothetical protein FIJI 0008 


14 


AW080237 


Hs.252884 


ESTs 


14 


AL1 37480 


Hs.6834 


KIAA1014 protein 


14 


BE559786 


Hs.375037 


hypothetical protein FLJ30092 


14 


AW206035 


Hs.356457 


ESTs 


14 


AI743317 


Hs.283622 


ESTs, Weakly similar to ALU5 HUMAN ALU SUBFAMILY SC SEQUENCE 
CONTAMINATION WARNING ENTRY IH.sapiensl 


14 


AI923953 


Hs.131830 


ESTs 


14 


H80137 


Hs. 157246 


ESTs 


14 


AA228092 


Hs.42656 


KIAA1681 protein 


14 


AI523875 


NA 


gb:tg97d04.xl NCI CGAP CLL1 Homo sapiens cDNA clone 3' similar to contains Alu 
repetitive element;contains element THR THR repetitive element ;, mRNA sequence 


14 


AI619957 


NA 


ESTs 


14 


AAO 19344 


Hs.2055 


ubiquitin-aetivating enzyme El (A1S9T and BN75 temperature sensitivity 
complementing) 


14 


AF07Q582 


Hs.26118 


hypothetical protein MGC 13033 


14 


AF095687 


Hs.26937 


brain and nasopharyngeal carcinoma susceptibility protein 


14 


AW452189 


Hs.27263 


KIAA1 458 protein 


14 


N58327 


Hs.302755 


ESTs 


15 


NA 


NA 


Target Exon 


15 


N33937 


Hs.10336 


ESTs 


15 


BE349470 


Hs.99918 


mucin 6, gastric 


15 


AW851603 


Hs.278831 


gb:MR2-CT0222-201099-001-fl)4 CT0222 Homo sapiens cDNA, mRNA sequence 


15 


BE091833 


NA 


gb:IL2-BT0731-260400-076-F04 BT0731 Homo sapiens cDNA, mRNA sequence 


15 


BE156536 


Hs.6217 


gb:QVO-HT0368-310100-091-hlO HT0368 Homo sapiens cDNA, mRNA sequence 


15 


AW795793 


Hs.356181 


Homo sapiens cDNA FLJ12257 fis, clone MAMMA1001501, highly similar to CALPAIN 
1, LARGE [CATALYTIC] SUBUNIT (EC 3.4.22.17) 


15 


AW952192 


Hs.406618 


guanine nucleotide binding protein (G protein), alpha stimulating activity polypeptide 1 


15 


AA962181 


Hs.111219 


ESTs, Moderately similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


15 


AA226377 


Hs. 193950 


ESTs 


15 


AA3 17036 


Hs.301771 


transforming growth factor, beta-induced, 68kD 


15 


T18988 


Hs.293668 


ESTs 
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15 


AA482027 


Hs.142569 


ESTs, Weakly similar to 138022 hypothetical protein TH.sapiensl 


15 


AA521410 


Hs.41371 


ESTs 


15 


AW971248 


Hs.291289 


ESTs, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiens] 


15 


AA502663 


Hs.145037 


ESTs 


15 


AA534908 


Hs.2860 


POU domain, class 5, transcription factor 1 


15 


AA775208 


Hs. 136423 


ESTs 


15 


AB029396 


Hs.381050 


beta-l,3-glucuronyltransferase 1 (glucuronosyltransferase P) 


15 


AW022133 


Hs. 189838 


ESTs 


15 


AA608955 


Hs. 109653 


ESTs 


15 


AI033647 


Hs.121001 


Homo sapiens, clone IMAGE: 34602 80, mRNA 


15 


AA704806 


Hs.143842 


ESTs, Weakly similar to 20043 99A chromosomal protein TH.sapiens] 


15 


AI690734 


Hs.62112 


Homo sapiens cDNA: FLJ22562 lis, clone HSI01814 


15 


AL353957 


Hs.284181 


hypothetical protein DKFZp434P0531 


15 


AA780020 


Hs.21320 


postreplication repair protein hRAD18p 


15 


H87407 


Hs.348407 


chorionic gonadotropin, beta polypeptide 


15 


AA833902 


Hs.270745 


ESTs 


15 


AA885234 


Hs.125774 


ESTs 


15 


AI792868 


Hs. 135365 


ESTs 


15 


AI762154 


Hs.3 15054 


Homo sapiens cDNA FU14014 fis, clone HEMBA1000290 


15 


AA010269 


Hs.16241 


ESTs 


15 


AW500269 


Hs.21264 


KIAA0782 protein 


15 


AL049390 


Hs.22689 


Homo sapiens mRNA; cDNA DKFZp58601318 (from clone DKFZp58601318) 


15 


AA011518 


Hs.271778 


ESTs, Weakly similar to 138022 hypothetical protein TH.sapiensl 


15 


AW451469 


Hs.209990 


ESTs 


15 


AW3 89509 


Hs.223747 


ESTs 


15 


AI924228 


Hs.l 15185 


ESTs, Moderately similar to PC4259 ferritin associated protein [H.sapiensl 


15 


AI821940 


Hs.72071 


hypothetical protein FU20038 


15 


BE142728 


NA 


gb:MR0-HT0 1 57-02 1 299-004-d08 HT0157 Homo sapiens cDNA, mRNA sequence 


16 


NM 020962.1 


NA 


NM 020962.1| Homo sapiens likely ortholog of mouse neighbor of Punc El 1 (NOPE), 


16 


AJ234589.1 


NA 


AJ237589.1|HSA237589 Homo sapiens mRNA for T-box transcription factor (TBX20 
gene), 


16 


AA3S6192 


Hs.l 93482 


Homo sapiens cDNA FLJ1 1903 fis, clone HEMBB1000030 


16 


AA302840 


Hs.403902 


gb:ESTl 0534 Adipose tissue, white I Homo sapiens cDNA 3' end, mRNA sequence 


16 


AW5 15373 


Hs.271249 


Homo sapiens cDNA FLJ13580 fis, clone PLACE1008851 


16 


AA1 36569 


Hs.356559 


KIAA0187 gene product 


16 


AI567436 


Hs.16258 


Homo sapiens cDNA FLJ1 1 699 fis, clone HEMBA1 005047, highly similar to RAS- 
RELATED PROTEIN RAB-24 


16 


R43528 


Hs.388002 


ESTs 


16 


AA828750 


NA 


gb:od76a07.sl NCI CGAP Ov2 Homo sapiens cDNA clone, mRNA sequence 


16 


AA676544 


Hs.171545 


HIV-1 Rev binding protein 


16 


AW972872 


Hs.293736 


ESTs 


16 


AI670057 


Hs.199882 


ESTs 


16 


AF065215 


Hs.198161 


phospholipase A2, group IVB (cytosolic) 


16 


AA456883 


Hs.79889 


monocyte to macrophage differentiation-associated 
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16 


R51790 


Hs.239483 


Human clone 23933 mRNA sequence 


16 


AA478883 


Hs.273766 


ESTs 


16 


AA572949 


Hs.207566 


ESTs 


16 


AW207279 


Hs.271786 


ESTs, Weakly similar to PC4395 mucin 3 [H.sapiens] 


16 


AF124150 


Hs.371417 


ESTs 


16 


AW203986 


Hs.213003 


ESTs 


16 


AW749865 


NA 


ESTs, Weakly similar to 138022 hypothetical protein [H.sapiensl 


16 


T85104 


Hs. 194477 


E3 ubiquitin ligase SMURF2 


16 


AW238673 


Hs. 146038 


ESTs 


16 


AI908538 


Hs. 133000 


ESTs, Weakly similar to S26689 hypothetical protein hcl - mouse TM.musculusl 


16 


AW771958 


Hs. 175437 


ESTs, Moderately similar to PC4259 ferritin associated protein fH.sapiensl 


16 


AI766732 


Hs.2 10628 


ESTs 


16 


AI903313 


Hs.34579 


F^Ttj MnHpratpIv cimilnr tr» AT TTfi T-TT TM A"W AT TT *1T TRTh A A/TTT V <2P QFnTrRMrP! 

CONTAMINATION WARNING ENTRY TH.sapiensl 


16 


AW974642 


Hs.366446 


ESTs, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY ^.sapiens] 


17 


D00159 


NA 


gb:Homo sapiens gene for pancreatic elastase I, partial cds. 


17 


AI204033 


Hs.379039 


tumor suppressor deleted in oral cancer-related 1 


17 


T40707 


Hs.270862 


ESTs 


17 


AW971303 


Hs.241869 


ESTs 


17 


AA320525 


Hs.201076 


ESTs 


17 


ALII 0203 


Hs.138411 


Homo sapiens mRNA; cDNA DKFZp5S6Jl 922 (from clone DKFZp586J1922) 


17 


AW970116 


Hs.310616 


ESTs 


17 


AW971146 


Hs.293187 


ESTs 


17 


T55958 


Hs.384169 


ffhrvh^SfO^ rl Stratflppnp fptal c-nl ppn (Q^T70^\ Rnmn cqnipnc pTYNJA pin-rip -mT?tvI A 

sequence 


17 


AW444619 


Hs.138211 


ESTs 


17 


AI239832 


Hs.15617 


ESTs, Weakly similar to ALU4 HUMAN ALU SUBFAMILY SB2 SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


17 


T85314 


Hs.54629 


thioredoxin-like 


17 


R10799 


Hs.191990 


ESTs 


17 


W69171 


Hs.267263 


hypothetical protein FU22283 (FU22283), mRNA. 


18 


AA682384 


NA 


ESTs 


19 


AW861225 


Hs. 110613 


BANP homolog, SMAR1 homolog 


20 


BRCAlb 


NA 


Eos Control: 



56 



WO 2004/090547 



PCT/US2004/010465 



Table 2: Cluster 1 Genes Indicative of Colorectal Cancer 



Cluster 


Exemplar 
Accession 


tlni Genie ID 






NA 


Hs 76297 


G nroteiti-counled recentnr Vinasp f\ fClPHTCfi} mPTsJ A 

V-i piWwlll WUUpiL>U Jlvvv^lUl IV 1 1 J CI 3 \J \\JL XVXVAJ y, IllXvlixV. 




NM 173483 


NA 


NM 1734R3 Homo satrierts hvnnthptirnl nrntpin FT T^0501 fPT T^Q^fil i 




NM 003468.2 


NA 


NM 003468 21 Homo sanipns fhV^lpH hnmnlncy 5 fnrricrvnhila 1 /TfTTi'? 1 m"R7\T A 




NA 


NA 


Tar opt Fvnn 




AC007050.25 


NA 


ESTs 




NA 


NA 


Target Exon 




W25945 


Hs.8173 


hypothetical protein FIJI 0803 


! 


AW054922 


Hs.53478 


Homo sapiens cDNA FLJ12366 fis, clone MAMMA100241 1 


1 


AW847814 


Hs.289005 


Homo sapiens cDNA: FLJ21532 fis, clone COL06049 


1 


BE244200 


Hs.406243 


KIAA0410 gene product 


! 


AW5 14668 


Hs. 194258 


ESTs, Moderately similar to ALU5 HUMAN ALU SUBFAMILY SC SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiens] 


! 


AA249096 


Hs.32793 


ESTs 


1 


L26953 


Hs.1010 


regulator of mitotic spindle assembly 1 


1 


AI381687 


Hs.404198 


ESTs 


! 


N99638 


Hs.87409 


gb:za39gl l.rl Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone 5' similar to 
contains Alu repetitive element;, mRNA sequence 


! 


AI205785 


Hs.190153 


ESTs 




AW965212 


Hs.278871 


hypothetical protein FU30921 (FLJ30921), mRNA. 




ALII 9442 


Hs.3 80968 


eukaryotic translation initiation factor 4 gamma, 2 




A A3 58045 


NA 


gb:EST66944 Fetal lung III Homo sapiens cDNA 5* end similar to EST containing Alu repeat, 

mRTsTA spnupfipp 

11 IJLVl^l .£ i. a^'^J Ll'wl It^SJ 




AL050276 


Hs 159456 


7\t\o fin opt rvrritpin 

£.1111/ lillgCl piLILClll ZOO 




AT05235R 


Hs 1 31741 

OS. JL_J 1 /HI 


ESTs 




AWQ76570 


Hs Q7387 

I lit .7/JO/ 


ESTs 




AI936504 


Hs.2083 


CDC-like kinase 1 




AA400079 


Hs.257854 


ESTs 




AW883367 


Hs.356546 


hypothetical protein MGC5306 




AA417696 


Hs.372121 


ESTs 




AA470152 


Hs.368209 


ESTs 




AW971375 


Hs.292921 


ESTs 




AW971070 


Hs 291 160 


ESTs, Weakly similar to ALU1_HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING FNTRY FH <5anipn«i1 




T8743 1 


Hs 190738 


ESTs 




AA531 129 


Hs 190297 


ESTs 


j 


AW439330 


Hs.256889 


ESTs, Weakly similar to 2109260A B cell growth factor ["H.sapiens! 




AW1 57424 


Hs.280685 


ESTs, Weakly similar to 138022 hypothetical protein TH.sapiensl 




AB040966 


Hs.83575 


KIAA1533 protein 




AW1 88370 


Hs.250383 


Homo sapiens cDNA FLJ14279 fis, clone PLACE1005574 




AA628539 


Hs.57783 


Homo sapiens eukaryotic translation initiation factor 3, subunit 9 eta, 1 16kDa (EIF3S9) 




AA640770 


Hs.200994 


EST 




AA664078 


NA 


gb:ac04a05.sl Stratagene lung (937210) Homo sapiens cDNA clone 3* similar to contains Alu 
repetitive element;, mRNA sequence 




AA886511 


Hs. 189282 


Homo sapiens cDNA: FLJ21429 fis, clone COL04205 
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I 


AA830893 


Hs. 119769 


ESTs 


x 


BE327477 


Hs. 166941 


ESTs 


I 


AI821940 


Hs-72071 


hypothetical protein FLJ20038 




AL1 37723 


Hs.5855 


Homo sapiens mRNA; cDNA DKFZp434D0818 (from clone DKPZp434D0818) 


1 


AA769874 


Hs. 155287 


ubiquitin-protein isopeptide ligase (E3) 


1 


AI126162 


Hs.129037 


ESTs 


1 


AW748336 


Hs. 168052 


KIAA0421 protein 


1 


AW083789 


Hs.124620 


ESTs 


! 


AI034357 


Hs.211194 


ESTs, Weakly similar to ALU8 HUMAN ALU SUBFAMILY SX SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


! 


AW827419 


Hs.144139 


ESTs 


! 


BE262656 


Hs.32603 


hypothetical protein MGC3279 similar to collectins 


! 


AW469180 


Hs.346398 


ESTs 


! 


AI492857 


NA 


gb:th72h08.xl Soares NhHMPu SI Homo sapiens cDNA clone 3\ mRNA sequence 


x 


AW451347 


Hs.175862 


ESTs 




AI698091 


Hs.107845 


ESTs 


x 


AJO 10046 


Hs.25155 


neuroepithelial cell transforming gene 1 




AL043983 


Hs.125063 


Homo sapiens cDNA FLJ13825 fis, clone THYRO1000558 




AW382884 


Hs.5320 


ESTs 




BE378541 


Hs.279815 


cysteine sulfmic acid decarboxylase-relatedprotein 2 




R66282 


Hs.20247 


ESTs, Weakly similar to S65657 alpha- lC-adrenergic receptor splice form 2 [H.sapiens] 




BE086548 


Hs.42346 


calcineurin-binding protein calsarcin-1 




AA907305 


Hs.36475 


ESTs 
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Table 3: Cluster 4 Genes indicative of Metastatic Colorectal Cancer 



Muster 


Exemplar 
Accession 




univxcne x itie 


A 

4 


AA130986 


HS.Z / 1 OZ / 


colS 


4 


T64896 


Hs.40o798 


Homo sapiens cDJNA FLJ1 1533 lis, clone HbMBAl 002678 


4 


AA132637 


Hs.15396 


Homo sapiens, clone IMAGE: 3 94 8 9 09, mRNA, partial cds 


4 


A A "3 1 TOtO 

AA3 17962 


Hs.249721 


ESTs, Moderately similar to PC4259 ferritin associated protein [H.sapiensl 


4 


AW 167439 


Hs. 190651 


TT^rm/. nTMVT A T7T T1 I/COT ■fin _1_„„ T-jT * nrini ini1 

Homo sapiens cDJNA FLJ13625 lis, clone JrLACElOl 1032 


4 


AW452823 


Hs. 135268 


ESTs 


4 


AA132255 


Hs. 143951 


ESTs 


4 


D83782 


Hs.78442 


£*1 T"* T~?T*1 T"fc /"IT T"? A t T A /"*iT™* A /TPTI 7 A TTKTi^ T*T* A' lT'TKT 

SREBP CLEAVAGE-ACTIVATING PROTEIN 


4 


AI690465 


Hs.201661 


ESTs, Weakly similar to JC5238 galactosylceramide-hke protein, GCP [H.sapiens] 


4 


R07785 


Hs. 4298 67 


ESTs 


4 


AL041465 


Hs. 1 82982 


golgin-67 


4 


A Tin o i rnr 

AW1 83695 


Hs.370907 


ESTs 


4 


AW276914 


TT— A 1/11 

Hs.423341 


Homo sapiens clone IMAGE:713177, mRNA sequence 


4 


U50535 


Tin 1 1 A/CIA 

Hs.l 10630 


Human BRCA2 region, mRNA sequence CG006 


4 


Ar 073931 


T T., 1 1T5CA 

Hs. 122359 


calcium channel, voltage-dependent, alpha 1H subunit 


4 


AW341131 


T T— 1 A /Z1 A C 

Hs. 146345 


ESTs 


4 


BE1 /6694 


Hs. 279860 


tumor protein, translationally-controlled 1 


4 


a ixrn^T 110 
AW963118 


IT- 1 iC1 TO /I 

Hs. 161784 


ESTs 


4 


A lire 1 O £. C\A 

AW5 13691 


T T '"\'~lf\ 1 AC\ 

Hs.270 1 49 


ESTs, Weakly similar to 2109260A B cell growth factor [H.sapiens] 


4 


BE173380 


Hs. 381903 


ESTs 


4 


Z29067 


Hs.2236 


NIMA (never in mitosis gene a)-related kinase 3 


4 


AA425310 


Hs.l 55766 


ESTs, Weakly similar to A47582 B-cell growth factor precursor [H.sapiens] 


4 


AW973253 


Hs.292689 


ESTs 


4 


AA453987 


it. AAA O /"VI 

Hs.l 44802 


ESTs 


4 


AA612/10 


T T_ An ,11 /I O 

Hs.284148 


ESTs 


4 


AA830335 


Hs.l 05273 


ESTs 


4 


AW970859 


Hs. 313503 


ESTs 


4 


A A riT71 O 

AA532718 


TT_ 1 ^7om.i 

Hs.l 78604 


ESTs 


4 


AI459519 


t|_ 11/1/1 IT 

Hs. 3 14437 


clone IMAGE:4607209, mRNA sequence [H.sapiens] 


4 


BE263901 


Hs. 381222 


ESTs, Weakly similar to S37431 ankyrin 2, neuronal long splice form [H.sapiens] 


4 


AI301080 


TT_ CS*1 £ 

Hs.35276 


T-r-T A a no » " 

KJAA0852 protein 


4 


AW975009 


T T_ fV> TT A 

H$. 29221 4 


ESTs, Weakly similar to A46010 X-hnked retinopathy protein [H.sapiens] 


4 


AA677540 


Hs.l 17064 


ESTs 


4 


\jn A no 
H/43iy 


HS. loooZO 


r!/t> IS 


4 


AI800041 


Hs.369733 


ESTs 


4 


AL360140 


Hs. 176005 


Homo sapiens mRNA full length insert cDNA clone EUROIMAGE 1 13222 


4 


AF134160 


Hs.7327 


claudin 1 


4 


AI982794 


Hs.l 59473 


ESTs 


4 


AK001631 


Hs.8083 


hypothetical protein FU 10769 


4 


W22152 


Hs.282929 


ESTs 
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4 


H77824 


NA 


ESTs 


4 


AU076643 


Hs.313 


secreted phosphoprotein 1 (osteopontin, bone sialoprotein I, early T-lymphocyte activation 1) 


4 


AW958124 


Hs. 142442 


HP1-BP74 


4 


AL137714 


Hs.356298 


hypothetical protein LOC58481 


4 


AA001266 


Hs.133521 


ESTs 


4 


AL133100 


Hs.377705 


hypothetical protein FU20531 


4 


AA001615 


Hs.84561 


ESTs 


4 


AA568515 


Hs.293510 


ESTs 


4 


AW079749 


Hs.184719 


Jba Is, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY [H.sapiensl 


4 


AL045285 


Hs.277401 


bromodomain adjacent to zinc finger domain, 2A 


4 


AI740647 


Hs.141012 


ESTs, Weakly similar to ALU1 HUMAN ALU SUBFAMILY J SEQUENCE 
CONTAMINATION WARNING ENTRY TH.sapiensl 


4 


AW976347 


Hs.76966 


ESTs 


4 


AI191811 


Hs.54629 


ESTs 
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Table 4: Cluster 1 Top Targets 



Training 

Data 
Effective 
Weights 


SEQ ID NOsj 


Exemplar 
Accession 


y 

UniGenelD 


• ' .. . V; ' v ■■ .,' _ : ■ : 

• " ' ,, ' ^\ V 1 

UniGeiie Title 


1.202 


8 &29 


BE262656 


Hs.32603 


hypothetical protein MGC3279 similar to collectins 


1.048 


9, 18 & 30 


AW382884 


Hs.5320 


MGC 16824 Esophageal cancer associated protein 


0.958 


10, 11, 31 &32 


AW847814 


Hs.289005 


Homo sapiens cDNA: FLJ21532 fis, clone COL06049 


0.773 


12 &33 


W25945 


Hs.8173 


hypothetical protein FIJI 0803 


0.763 


13, 19&34 


AI698091 


Hs.107845 


ESTs 


0.666 




AI205785 


Hs.190153 


Unnamed rvrotein nrndnot Tl-T <3nm"pnc1 


0.625 




AL043983 


Hs.125063 


Homo saniens cDNA FT,T1 "38? 5 ft*: r1nn<=> TRVRmnnn^<c 


0.503 




AA531129 


Hs. 190297 


ESTs 


0.492 




NM 173483 


NA 


ESTs 


0.352 




BE327477 


Hs.166941 


ESTs 


0.332 




AI936504 


Hs.2083 


CDC-like kinase 1 


v.Uj I 




K66282 


Hs.20247 


ESTs, Weakly similar to S65657 alpha- IC-adrenergic 
receptor splice form 2 [H.sapiensl 






AC007050.25 


NA 


ESTs 


0.023 




BE378541 


Hs.279815 


Cysteine sulftnic acid decarhn¥\flAQ(*-vFAc*tf>riY\rntf*irk o 


-0.028 




AA907305 


Hs.36475 


ESTs 


-0.098 




AW748336 


Hs. 168052 


KIAA0421 protein 


-0.466 




AI034357 


Hs.211194 


ESTs, Weakly similar to ALUS HUMAN ALU 
SUBFAMILY SX SEQUENCE CONTAMINATION 
WARNING ENTRY TH.sapiensl 


-0.666 




AW976570 


Hs.97387 


ESTs 


-0.996 


14, 20&35 


AW054922 


Hs.53478 


Homo sapiens cDNA FLJ12366 fis, clone MAMMA1002411 


-1.065 


15,21 &36 


AA830893 


Hs. 119769 


ESTs 
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Table 5: Cluster 4 Top Targets 



Training Data 
Effective 
Weights 


SEQIB 
NOs; 


■'*■ 

Exemplar 
Accession 


' • V. 
tTniGenelD 


UntGene Title 


2.041 


1 &22 


AU076643 


Hs.313 


secreted phosphoprotein 1 (osteopontin, bone sialoprotein 
i, eariy i-iympnocyte activation 1) 


1.644 


2 &23 


AA132637 


Hs 15396 


Homo sapiens, clone IMAGE: 3 948 90 9, mRNA, partial 


1.244 


3, 16, & 34 


AW276914 


Hs.423341 


Homo sapiens clone IMAGE:713177, mRNA sequence 


1.171 


4&25 


AL133100 


Hs.377705 


hypothetical protein FLJ20531 -NM 017865 


1 1 AO 
1 .102 


3, I / Ot ZO 


AA612710 


Hs.284148 


ESTs 




6 & 27 


ATI 1T7 1 >1 

AL137714 


Hs.356298 


hypothetical protein LOC58481 


U.4oo 




A TOAAA A 1 

AI8 00041 


Hs.369733 


ESTs 


U.4J / 




A1982794 


Hs. 159473 


ESTs 


0.217 




AL045285 


Hs.277401 


BAZ2A. Bromodomain adjacent fn 7inc frnwr Hnmnin 
2A 


0.138 




T64896 


Uc 406798 


Homo sapiens cDNA FLJ1 1533 fis, clone 

rAJC/iviijjv i uUxSO / 0 


0.040 




AA425310 


Hs.155766 


ESTs, Weakly similar to A47582 B-cell growth factor 
precursor [H.sapiens] 


-0.056 




AW976347 


Hs.76966 


ESTs 


C\ 1 11 




H74319 


Hs. 188620 


ESTs 


-0.298 




AW079749 


Hs.184719 


ESTs 


-0.303 




AI459519 


Hs.3 14437 


clone IMAGE:4607209, mRNA sequence |"H.sapiensl 


-0.319 




H77824 


NA 


ESTs 


-0.321 




AA830335 


Hs. 105273 


ESTs 


-0.602 




W22152 


Hs.282929 


ESTs 


-0.723 




R07785 


Hs.429867 


ESTs 


-1.306 


7&28 


U50535 


Hs.110630 


Human BRCA2 region, mRNA sequence CG006 
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Table 6: Full Length Nucleic Acid and Protein Sequnces of Some Genes That 
Characterize Metastatic Colorectal Cancer 

NUCLEIC ACID SEQUENCES 

Seq ID NO: 1 
Primekey #: 446619 
Coding sequence: 8 8.. 990 

1 11 21 31 41 51 

I I I I I I 

GCAGAGCACA GCATCGTCGG GACCAGACTC GTCTCAGGCC AGTTGCAGCC TTCTCAGCCA 6 0 

AACGCCGACC AAGGAAAACT CACTACCATG AGAATTGCAG TGATTTGCTT TTGCCTCCTA 12 0 

GGCATCACCT GTGCCATACC AGTTAAACAG GCTGATTCTG GAAGTTCTGA GGAAAAGCAG 18 0 

CTTTACAACA AATACC CAGA TGCTGTGGCC ACATGGCTAA ACCCTGACCC ATCTCAGAAG 24 0 

CAGAATCTCC TAGCCCCACA GACCCTTCCA AGTAAGTCCA ACGAAAGCCA TGACCACATG 3 00 

GATGATATGG ATGATGAAGA TGATGATGAC CATGTGGACA GCCAGGACTC CATTGACTCG 3 60 

AACGACTCTG ATGATGTAGA TGACACTGAT GATTCTCACC AGTCTGATGA GTCTCACCAT 42 0 

TCTGATGAAT CTGATGAACT GGT CACTGAT TTTCCCACGG ACCTGCCAGC AACCGAAGTT 4 80 

TTCACTCCAG TTGTCCCCAC AGTAGACACA TATGATGGCC GAGGTGATAG TGTGGTTTAT 54 0 

GGACTGAGGT CAAAATCTAA GAAGTTTCGC AGACCTGACA TCCAGTACCC TGATGCTACA 60 0 

GACGAGGACA TCACCTCACA CATGGAAAGC GAGGAGTTGA ATGGTGCATA CAAGGCCATC 660 

CCCGTTGCCC AGG AC CTGAA CGCGCCTTCT GATTGGGACA GCCGTGGGAA GGACAGTTAT 72 0 

GAAACGAGTC AGCTGGATGA CCAGAGTGCT GAAAC C C AC A GCCACAAGCA GTCCAGATTA 78 0 

TATAAGCGGA AAGC CAATGA TGAGAGCAAT GAGCATTCCG ATGTGATTGA TAGTCAGGAA 84 0 

CTTTCCAAAG TCAGCCGTGA ATTCCACAGC CATGAATTTC AC AG C CAT G A AGATATGCTG 90 0 

GTTGTAGACC CCAAAAGTAA GGAAGAAGAT AAACACCTGA AATTTCGTAT TTCTCATGAA 960 

TTAGATAGTG CATCTTCTGA GGTCAATTAA AAGGAGAAAA AATACAATTT CTCACTTTGC 102 0 

ATTTAGTCAA AAGAAAAAAT GCTTTATAGC AAAATGAAAG AGAACATGAA ATGCTTCTTT 10 8 0 

CTCAGTTTAT TGGTTGAATG TGTATCTATT TGAGTCTGGA AATAACTAAT GTGTTTGATA 114 0 

ATTAGTTTAG TTTGTGGCTT CATGGAAACT CCCTGTAAAC TAAAAGCTTC AGGGTTATGT 12 0 0 

CTATGTTCAT TCTATAGAAG AAATGCAAAC TATCAC TGTA TTTTAATATT TGTTATTCTC 12 60 

TCATGAATAG AAATTTATGT AGAAGCAAAC AAAATACTTT TACCCACTTA AAAAGAGAAT 13 2 0 

ATAACATTTT ATGTCACTAT AATCTTTTGT TTTTTAAGTT AGTGTATATT TTGTTGTGAT 13 8 0 

TATCTTTTTG TGGTGTGAAT AAATCTTTTA TCTTGAATGT AATAAGAATT TGGTGGTGTC 144 0 

AATTGCTTAT TTGTTTTCCC ACGGTTGTCC AGCAATTAAT AAAACATAAC CTTTTTTACT 15 0 0 

GCCTAAAAAA AAAAAAAAAA AAAA 1524 



Seq ID NO: 2 
Primekey # : 40 8199 
Coding sequence: 27.. 734 

1 11 21 31 41 51 

I I I I I I 

GTGCAAGCAT CTGAAGAGCT GCCGGGATGC AGCAGAGAGG AGCAGCTGGA AGCCGTGGCT 60 

GCGCTCTCTT CCCTCTGCTG GGCGTCCTGT TCTTCCAGGG TGTTTATATC GTCTTTTCCT 12 0 

TGGAGATTCG TGCAGATGCC CATGTCCGAG GTTATGTTGG AGAAAAGATC AAGTTGAAAT 18 0 

GCACTTTCAA GTCAACTTCA GATGTCACTG ACAAACTTAC TATAGACTGG ACATATCGCC 24 0 

CTCCCAGCAG CAGCCACACA GTAT CAATAT TTCATTATCA GTCTTTCCAG TACCCAACCA 3 00 

CAGCAGGCAC ATTTCGGGAT CGGATTTCCT GGGTTGGAAA TGTATACAAA GGGGATGCAT 3 60 

CTATAAGTAT AAGCAACCCT ACCATAAAGG ACAATGGGAC ATTCAGCTGT GCTGTGAAGA 42 0 

ATCCCCCAGA TGTGCATCAT AATATTCCCA TGACAGAGCT AACAGTCACA GAAAGGGGTT 48 0 

TTGGCACCAT GCTTTCCTCT GTGGCCCTTC TTTCCATCCT TGTCTTTGTG CCCTCAGCCG 54 0 

TGGTGGTTGC TCTGCTGCTG GTGAGAATGG GGAGGAAGGC TGCTGGGCTG AAGAAGAGGA 60 0 

GCAGGTCTGG CTATAAGAAG TCATCTATTG AGGTTTCCGA TGACACTGAT CAGGAGGAGG 66 0 

AAGAGGCGTG TATGGCGAGG CTTTGTGTCC GTTGCGCTGA GTGCCTGGAT TCAGACTATG 72 0 

AAGAGACATA TTGATGAAAG TCTGTATGAC ACAAGAAGAG TCACCTAAAG ACAGGAAACA 78 0 
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TCCCATTCCA CTGGCAGCTA AAGCCTGTCA GAGAAAGTGG AGCTGGCGTG GACCATAGCG 84 0 

ATGGACAATC CTGGAGATCA TCAGTAAAGA CTTTAGGAAC CACTTATTTA TTGAATAAAT 90 0 

GTTCTTGTTG TATTTATAAA CTGTTCAGGA ACTCTCATAA GAGACTCATG ACTTCCCCTT 960 

TCAATGAATT ATGCTGTAAT TGAATGAAGA AATTCTTTTC CTGAGCAAAA AGATACTTTT 102 0 

TGATTCATCT TTGCTCTGGA ATGTATTACA TGTTTTCTTC CAACTGTTTG AAGGAGAATT 10 8 0 

TTGAATGTTT GCCACACCGC TGATACC CAA ATAATTTTTT AAATGAAGTG GAGCTTGTGG 114 0 

CTTCCTGATG TGTCACCAGA CAAAATATTC GCTTGGGATA TGTATTCTTT GTTTTTTGCT 12 0 0 

C C ATGTAC AC TTTCAGCTGT GAGTTAGTAT AGGGCGTATA CTTACCGGTT TAATGACCTC 12 60 

AACCTCAGTT GTGTTTGGAT AACTTAGGGT GTATACCCTT AGTTTCCTTA GAGTTGGTAG 132 0 

GAT CAAGTCA TTGGTTTGCT TTGACTGGGT TTTTAAAGTA TTAAGTACAG TGTCATCAAT 13 8 0 

TTACAGTTAA GGAAAGGAAT CGTGAAGTAG AAAAATTATT TTCTTTAGTC TTGCTGGTAC 144 0 

AATTTGGGCT AAGGAGTCTT TGTTATTTTC TGTCTTGCTT TTTTTTTTTT TTTTTTTTTT 15 0 0 

TTGAGGCAGA GTCTCACTCT GTCGCCAGGC TGGAGTGCAG TGGTGTGATC TTGGCTCACT 1560 

GCAACCTCTG CCTCCTGGGT TCAAGCGATT CTTGTGCCTC AGCCTCTCGA GTAGCTGGGA 162 0 

TTACAGGCAT GCGCCACCAC ACCCAGCTAA TTTTTGTGTT TTTAGTAGAG ACGGGGTTTC 168 0 

AC C ATTTTGG CCAGGATGGT CTCAATCCCC TGACCTCGTG ATCCACCTGC CTCGGCCTCC 174 0 

CAAAGTGTTG GGATTACAGG ( CATGAGCCAC TGTGCTTGGC CTGTTATTTT ATTTTCTTAT 18 0 0 

AACTACAACT TTTCTTCTTG AATTTTCAGG TCAGAGGCAA GAAAAACTCT TTACAGGTTT 186 0 

TTAGTGGGGG GCTTATGGAG TATTTCAGGA GTTCTTTGCA AATTAAATCA TCTTTTCACT 192 0 

TGTATTGTTT TTCAAAACTT TGTTGATTTC TAAAATGTGC CAACTGTGAG TAAACTATGG 198 0 

TATTTGCAAG TGGTTTTTAC ATAATATTTG AGATGAGGAA GTGAGATTGT GCATGACATA 2 04 0 

CTTCTCCTTT GTATTCTCTC AGTGCCTTAC AGCAGGTTAC TCCATTCTGC TATGACAACT 210 0 

TGTTTCAAAT GTTAATTTAC ATAGGATTTT TTATAAGCCA TTAAGGCATA TGTATAGTAT 216 0 

ATCAGTAAAG ATGGATGGTG CATATATAAA TAGTCTTCTG TAATAGTGAT TGGATTTACT 2 22 0 

TCTCAATTAT GAGAGACAAA AATTATCCCC TCACCTGTCT CTATTCTTTC AACAGGTTGA 22 8 0 

TCCCTTTTCA TGATTTTTCA TTAGGTGGTT CAGGAAGTTT C CATATTAC A GCGCTTCACA 2 34 0 

CTGTATATGT TAGTTTAAAA ATCACTTTTC TCTCTCTCAA CTTCTTTCTT TTTTTTTTGA 24 0 0 

AGACTTAATT TAAAAAATTT GGGTTGTTAG ATCCGTATCA TAGATTTGGC CTAGCCTCTT 24 60 

CTGTTAACCT AGTCCACAGA TGAGCGAATC TGGTTAGTTG AAGGACATTG TGATTTGACT 2 52 0 

CTGGTCACGC GAGGAAGTAG AAGGGCAAAG ACAGGACCGG CAGTTTACAT TTCCAGTGGT 2 58 0 

TAAACCTCAC GGTACTTTGG GACTGCTTGT TAACTTTTGT GGTTGTCTGA GGCCAATCTA 2 64 0 

ACGTGACCAT TTCTGACACC TCAACAGAGA GAGGAAAGCA ACTTGAGCAA TGAGAGTAAA 2 70 0 

TAACTTGGGC TCTCAGAGAT TTGAAGATAG AGATCTCATT GTGAGGGGGA CTATTTTGCA 2 760 

GGTCCTCATT TCTCCAAGAA AGAGATGGTG TTACAGGAAC CCACTGAAAG CCATATCCCA 2 82 0 

TTAAATGAGG AACTAATTTT GGCTGGGCCT TCTTGTAATG TCCTCGCAGG TGTGTTGTGA 2 88 0 

AGATTAATGC AGGGTAGTAT GTTTGTAGAT TGAC AC C TAG TCTAAACTTG AGGTAATTGG 2 94 0 

TGCTCTGTGA ATACTCAGTC GTGTTCTTTT ATAGCCTTAA TCATGATTTG AACTAGTCCC 300 0 

TTGCTTTTTA AATGACTGAA TGAAGTCCTT CGTGGTAAGG GAGTACGTTG ATAACTTAGT 3 0 60 

TTACTATATG GGTTTGTGGT CGCATCCCAG TCATCAGCTG CTATCATTTT CCTTCTTCAT 3120 

CCCTTATACT GAGATTTGGG TTACAGCTTT TTATTCTTCG AAGGATCACA AAGCAGTGTA 318 0 

CAGACACCTG CCTTCTTTAA GGATGAAAGG AAGATAAAGT GGTCTTTTTT TGTTTACTTA 3 24 0 

TTTGTTTCAC CTCTTGTTTG AGTAACTTCT AAGGTGCTAT TCTCTCTCTC TTTTTGCTAC 33 0 0 

CTCATGAGCT CTTGTCACAG CCATGGAAAC CAGCCTCGTT TAGAAAGGGA ACTTAGTTCA 3 3 60 

GAAGGGGTTA AAAGCCTTCC AGAATTTTTC TTTAGCTGCT GAAGTTTTTA CATGTGGTTA 342 0 

CATGACTTTA AGTTTTATGC ATTACGCTCT TAATTCTATT ACAAAATGTG GACTCACCAA 34 8 0 

TTGCTTTGTG TTTTCCATGT GACCTGTTAC TTCAGGCTAC TTGGGGAACA TCTTAGTCCT 3 54 0 

CTGTAGCTCC TGAACCCAGC ACTGGTGCTT CAAGAGAGAA GGTAGCACGT CTTTGTTCAA 3 60 0 

AACAAAACAA AACGACACTT CTGGAGGCCA CATCCTGAAT ATGAATGTTC TACTAAGTCA 3 660 

CTCAGTTATG GTTCTAAAGG GAAACTGTAA GAAGAC CCAC AAGGAGTGGA CCAAGAC TAT 3 72 0 

TATTTAATTG CACAACTTGA AACTTTGCTG CCAGAAGAGG CAGCTCCATT CCTTTGACTC 3 78 0 

CAGTGTTGGG CTGTTAACTG CTGCACCTCA TTGCCTTTTT TTGTTTTTGT TTTTGTTTTG 3 84 0 

TAGGAGGGTA GGCACTGTTG GGCCATATGC ACAAATATTG TAACTCTTGG TATCTTTACT 3 90 0 

GCAT CATAGT CAATAAACTT CTTTGTACCC TT 3 93 2 
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TGAAGGTAAA ATTTTCCAGA TACGGCAGAC GGCTTTCAGA GTACAATAAA CAGGGAATGA 60 

GAACTATTTA CATGGAAGTT TCTTTCTCAT GATGCGGTGG AGAAGCGTCG GCCACTTGGT 12 0 

TCTGCCAGAT GTTCCTGGGG TTACTGTAAA TGGGAAGGAC AGGCAGAGCT AAACAAGGTT 180 

TATCATTTAA AAGTGCCTGT GTGAAGTCAC TTTTGCTGGA AAACTGCAGC TTGGGAGCTT 24 0 

TCTTTGTATT CACATCCCAC TCTTCTGTGA AGTACACTTT ACCCTGACCT TATGAGTGGA 3 00 

TGAAGATACC TCAGTTGTCT GACTTTGCCA ATTGCTTAAT TTCAGAATTT AAAAAGGGGA 3 60 

AAGAAAAACA TCCTGCTAAA ATATGAACAT CTGAGTGTCT TATTTTCCAA CATCGTCAAT 42 0 

AGCTGTGAGC GTCAGCATTA AATATTCTCC CAAGGAGTGC CATGATATTG AAGTCACTTT 4 80 

ATTAATAACA GCTGTATCTG CAAAACAGTC AAGAGACTCG GACGTTGAAA GCCAGAGATG 540 

ACACTGAGCA TGCTTTTATT GCGGCCTACC ATCTTTAAGT GGGACATATT GATTGATGAG 60 0 

TGATTGCCTG TCCATACACT CTCTCATCAT CCTGTTCCTT GGATTGGACT TCACTAAGCA 660 

ATTTATCACT CACCTTCAGA CTTACATGTG GGAGTTTTCA CAACAGTAGT TTTGGAATCA 72 0 

TTAGAACTTG GATTGATTTC ATCATTTAAC AGAAACAAAC AGCCCAAATT ACTTTATCAC 78 0 

CATGGCTTTG AACGTTGCCC CAGTCAGAGA TACAAAATGG CTGACATTAG AAGTCTGCAG 84 0 

ACAGTTTCAA AGAGGAACAT GCTCACGCTC TGATGAAGAA TGCAAATTTG CTCATCCCCC 90 0 

CAAAAGTTGT CAGGTTGAAA ATGGAAGAGT AATTGCCTGC TTTGATTCCC TAAAGGGCCG 960 

TTGTTCGAGA GAGAACTGCA AGTATCTTCA CCCTCCGACA CACTTAAAAA CTCAACTAGA 102 0 

AATTAATGGA AGGAACAATT TGATT CAGCA AAAAACTGCA GCAGCAATGC TTGCCCAGCA 10 8 0 

GATGCAATTT ATGTTTCCAG GAACACCACT TCATCCAGTG CCCACTTTCC CTGTAGGTCC 114 0 

CGCGATAGGG ACAAATACGG CTATTAGCTT TGCTCCTTAC CTAGCACCTG TAACCCCTGG 12 0 0 

AGTTGGGTTG GTCCCAACGG AAATTCTGCC CACCACGCCT GTTATTGTTC CCGGAAGTCC 12 60 

ACCGGTCACT GTCCCGGGCT CAACTGCAAC TCAGAAACTT CTCAGGACTG ACAAACTGGA 132 0 

GGTATGCAGG GAGTTCCAGC GAGGAAACTG TGCCCGGGGA GAGAC CGACT GCCGCTTTGC 13 8 0 

ACACCCCGCA GACAGCACCA TGATCGACAC AAGTGACAAC ACCGTAACCG TTTGTATGGA 144 0 

TTACATAAAG GGGCGTTGCA TGAGGGAGAA ATGCAAATAT TTTCACCCTC CTGCACACTT 15 0 0 

GCAGGCCAAA ATCAAAGCTG CGCAGCACCA AGCCAACCAA GCTGCGGTGG CCGCCCAGGC 15 6 0 

AGCCGCGGCC GCGGCCACAG TCATGGCCTT TCCCCCTGGT GCTCTTCATC CTTT AC C AAA 162 0 

GAGACAAGCA CTTGAAAAAA GCAATGGTAC CAGCGCGGTC TTTAACCCCA GCGTCTTGCA 168 0 

CTACCAGCAG GCTCTCACCA GCGCACAGTT GCAGCAACAC GCCGCGTTCA TTCCAACAGG 17 a 0 

GTCAGTTTTG TGCATGACAC CCGCTACCAG TATTGTACCC ATGATGCACA GCGCTACGTC 18 0 0 

CGCCACTGTC TCTGCAGCAA CAACTCCTGC AACAAGTGTC CCCTTCGCAG CAACAGCCAC 18 6 0 

AGC CAATCAG ATAATTCTGA AATAATCAGC AGAAACGGAA TGGAATGCCA AGAATCTGCA 192 0 

TTGAGAATAA CTAAACATTG TTACTGTACA TACTATCCTG TTTCCTCCTC AATAGAATTG 19 8 0 

CCACAAACTG CATGCTAAAT AAAGATGTAG TTCTTCTGGA C AG AC C AC AA CTCTAAGAAG 2 04 0 

CTAGTGCTGC TATCTCATAT ATGAGTATTA AATATGGTAT GCTTACTATA TTCCAACCTA 210 0 

AGATAGTTAA CTACCTGAGA CCAGCTGTGA TGTTTAAAGA CATAAAGGAT AAAGTTTACT 216 0 

TTTAAAGGGT TTCTAAACAT AGTTTCTGTC CTAGGAATAT TGTCTTATCT CCATAACTAT 22 2 0 

AGCTGATGCA GAAAGTC CAG CCAGTTTACT CATTTCGATT CAGAATATTT CAAATTTAGC 22 8 0 

AATAAACAAT TAGCATTAGT TAAAAAAGAA AC AT ATT C C A AGGGCAGGTT CGATTCTAGC 2 34 0 

TCTAATTACT GTCATGTCAT TTACCCACTG GATCAAAGGG TATGTTTCAC TTCTTGACAA 24 0 0 

TATAAATGCT GCAGCAAAGA TGAGAGGTGA AGTAAAACCG ATACCTGTCC TGCAGGTCTA 2 4 60 

AAATTTGAAT GGAAATTCAA GCACAAGTAC TGGGGACACA TCAAAGTGTG GTGTTTGGTT 2 52 0 

TGCCTGGAGA TGCCACGTTG AATCATGTGA TTCTAGATTA ACATTAAATA GATTGAAAAA 2 58 0 

GAAACTTTGC ACGGTATGAG CTTCATACCC CACCAAACAA AGTCTTGAAG GTATTATTTT 2 64 0 

ACAAGTATAT TTTTAAAGTT GTTTTATAAG AGAGACTTTG TAGAAGTGCC TAGATTTTGC 2 70 0 

CAGACTTCAT CCAGCTTGAC AAGATTGAGA GGCCCATGCC AACAGTCTAA TCTAAGAGAT 2 76 0 

TAGTCTTTCA AACTCACCAT CCAGTTGCCT GTTACAGAAT AACTCTTCTT AACTAAAAAC 2 82 0 

CTAGTCAAAC AAGGAAGCTG TAGGTGAGGA GAT C TGTAT A ATATTCTAAT TTAAGTAAGT 2 88 0 

TTGAGTTTAG TCACTGCAAA TTTGACTGTG ACTTTAATCT AAATTAC TAT GTAAACAAAA 2 94 0 

AGTAGATAGT TTCACTTTTT AAAAAATCCA TTACTGTTTT GCATTTCAAA AGTTGGATTA 3 00 0 

AAGGGTTGTA ACTGACTACA GCATGGAAAA AAATAGTTCT TTTAATTCTT TCACCTTAAA 3 0 60 

GCATATTTTA TGTCTCAAAA GTATAAAAAA CTTTAATACA AGTACATACA TATTATATAT 312 0 

ACACATACAT ATATATACTA TATATGGATG AAACATATTT TAATGTTGTT TACTTTTTTA 318 0 

AATACTTGGT TGATCTTCAA GGTAATAGCG ATACAATTAA ATTTTGTTCA GAAAGTTTGT 3 24 0 

TTTAAAGTTT ATTTTAAGCA CTATCGTACC AAATATTTCA TATTTCACAT TTTATATGTT 3 3 00 

GC AC AT AGC C TATACAGTAC CTACATAGTT TTTAAATTAT TGTTTAAAAA ACAAAACAGC 33 6 0 

TGTTATAAAT GAATATTATG TGTAATTGTT TCAAACATCC ATTTTCTTTG TGAACATATT 342 0 

AGTGATTGAA GTATTTTGAC TTTTGAGATT GAATGTAAAA TATTTTAAAT TTGGGATCAT 34 8 0 
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CGCCTGTTCT G AAAAC TAG A TGCACCAACC GTATCATTAT TTGTTTGAGG AAAAAAAGAA 3 54 0 

ATCTGCATTT TAATTCATGT TGGTCAAAGT CGAATTACTA TCTATTTATC TTATATCGTA 3 60 0 

GATCTGATAA CCCTATCTAA AAGAAAGTCA CACGCTAAAT GTATTCTTAC ATAGTGCTTG 3 660 

TATCGTTGCA TTTGTTTTAA TTTGTGGAAA AGTATTGTAT CTAACTTGTA TTACTTTGGT 3 72 0 

AGTTTCATCT TTATGTATTA TTGATATTTG TAATTTTCTC AACTATAACA ATGTAGTTAC 3 78 0 

GCTACAACTT GCCTAAAACA TTCAAACTTG TTTTCTTTTT TGTGTTTTTT TCTTTGTTAA 3 840 

TTCATTTAAA CTCATTGAAA ACATAGTATA CATTACTAAA AGGTAAATTA TGGGAATCAC 3 90 0 

TGAAATATTT TTGTAGATTA ATTGTTGTAA CATTGTCTTT CTTTTTTTTC TTTTGTTTCA 3 960 

TGATTTTGAT TTTTAAAATT ATTAGCACAC AACTATTTTC AGCCCTTTAA TAATGGAGCA 402 0 

T C AAAAAC AT CACCTGTAAC CCCAAGCAAA TATAGAAGAC TGTATTTTTT ACTATGATAT 40 80 

CCATTTTCCA GAATTGTGAT TACAATATGC AAAG AGT CAT AAATATGCCA TTTACAATAA 414 0 

GGAGGAGGCA AGGCAAATGC ATAGATGTAC AAATATATGT ACAACAGATT TTGCTTTTTA 42 0 0 

TTTATTTATA ATGTAATTTT ATAGAATAAT TCTGGGATTT GAGAGGATCT AAAACTATTT 42 60 

TTCTGTATAA ATATTATTTG CCAAAAGTTT GTTTATATTC AGAAGTCTGA CTATGATGAA 4 32 0 

TAAATCTTAA ATGCTTTGTT TAATTAAAAA ACAAAAATCA CCAATATCCA AGACATGAAG 43 8 0 

ATATCAGTTC AACAAATACT GTAGTTAAGA GACTAACTCT CCACTTGTAT GGGAACTACA 444 0 

TTTCACTCTT GGTTTTCAGG ATATAACAGC ACTTCACCGA AATATTCTTT CAGCCATACC 450 0 

ACTGGTAACA TTTCTACTAA ATCTTTCTGT AACACTTAAA GAATTCCCTC ATTCATTACC 45 60 

TTACAGTGTA AACAGGAGTC TAATTTGTAT CAATACTATG TTTTGGTTGT AATATTCAGT 462 0 

TCACTCACCC AATGTACAAC CAATGAAATA AAAGAAGCAT TTAAA 4 665 
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AGCAGCCGAC GCCGAGAGGC ACCGTTTCTT CTTAAAAGAG AAACGCTGCG CGCGCGAGGT 6 0 

GGGCCCCTGT CTTCCAGCAG CTCCGGGCCT GCTCGCTAGG CCCGGGAGGC GCAGGCGCAG 120 

GCGCAGTGGG GGTGAGGGCG CGTGGGGGCG CACAGCCTCT GGTGCACATG GCTTCCTCCC 18 0 

CGGCGGTGGA CGTGTCCTGC AGGCGGCGGG AGAAGCGGCG GCAGCTGGAC GCGCGCCGCA 240 

GCAAGTGCCG CATCCGCCTG GGCGGCCACA TGGAGCAGTG GTGCCTCCTC AAGGAGCGGC 3 00 

TGGGCTTCTC CCTGCACTCG CAGCTCGCCA AGTTCCTGTT GGACCGGTAC ACTTCTTCAG 3 60 

GCTGTGTCCT CTGTGCAGGT CCTGAGCCTT TGCCTCCAAA AGGTCTGCAG TATCTGGTGC 42 0 

TCTTGTCTCA TGCCCACAGC CGAGAGTGCA GCCTGGTGCC CGGGCTTCGG GGGCCTGGCG 4 80 

GCCAAGATGG GGGGCTTGTG TGGGAGTGCT CAGCAGGCCA TACCTTCTCC TGGGGACCCT 54 0 

CTTTGAGCCC TACACCTTCA GAGGCACCCA AGCCAGCCTC CCTTCCACAT ACTACTCGGA 600 

GAAGTTGGTG TTCCGAGGCC ACGAGTGGGC AGGAGCTTGC AGATTTGGAA TCTGAGCATG 660 

ATGAGAGGAC TCAAGAGGCC AGGTTGCCCA GGAGGGTGGG ACCCCCACCA GAGACCTTCC 72 0 

CACCTCCAGG AGAGGAAGAG GGTGAGGAAG AAGAGGACAA TGATGAGGAT GAAGAGGAGA 780 

TGCTCAGTGA TGCCAGCTTA TGGACCTACA GCTCCTCCCC AGATGATAGT GAGCCTGATG 84 0 

CCCCCAGACT ACTGCCTTCC CCTGTCACCT GCACACCTAA AGAGGGGGAG ACACCACCAG 90 0 

CCCCTGCAGC ACTCTCCAGT CCTCTTGCTG TGCCGGCCTT GTCAGCATCC TCATTGAGTT 960 

CCAGAGCTCC TCCACCTGCA GAAGTCAGGG TGCAGCCACA GCTCAGCAGG ACCCCTCAAG 1020 

CGGCCCAGCA GACTGAGGCC CTGGCCAGCA CTGGGAGTCA GGCCCAGTCT GCTCCAACCC 1080 

CGGCCTGGGA TGAGGACACT GCACAAATTG GCCCCAAGAG AATTAGGAAA GCTGCCAAAA 1140 

GAGAGCTGAT GCCTTGTGAC TTCCCTGGCT GTGGAAGGAT CTTCTCCAAC CGGCAGTATT 1200 

TGAATCACCA CAAAAAGTAC CAGCACATCC AC C AGAAGTC TTTCTCCTGC CCAGAGCCAG 12 60 

CCTGTGGGAA GTCTTTCAAC TTTAAGAAAC AC CTGAAGGA GCACATGAAG CTGCACAGTG 132 0 

ACACCCGGGA CTACATCTGT GAGTTCTGCG CCCGGTCTTT CCGCACTAGC AGCAACCTTG 13 8 0 

TCATCCACAG ACGTATCCAC ACTGGAGAAA AACCCCTGCA GTGTGAGATA TGCGGGTTTA 1440 

CCTGCCGCCA GAAGGCTTCC CTGAACTGGC AC CAGCGCAA GCATGCAGAG ACGGTGGCTG 15 0 0 

CCTTGCGCTT CCCCTGTGAA TTCTGCGGCA AGCGCTTTGA GAAGCCAGAC AGTGTTGCAG 15 60 

CCCACCGTAG CAAAAGTCAC CCAGCCCTGC TTCTAGCCCC TCAAGAGTCA CCCAGTGGTC 162 0 

CCCTAGAGCC CTGTCCCAGC ATCTCTGCCC CTGGGCCTCT GGGATCCAGC GAGGGGTCCA 168 0 

GGCCCTCTGC ATCTCCTCAG GCTCCAACCC TGCTTCCTCA GCAATGAGCT CTCCTCCAGC 1740 

TTTGGCTTTG GGAAGCCAGA CTCCAGGGAC TGAAAAGGAG CAACAAGGAG AGGGTCTGCT 18 0 0 

TGAGAAATGC CAGATGCTTG GTCCCCAGGA ACTAAGGCGA CAGAGTGCAG GGTGGGGGCA 18 60 
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AGACTGGGCT GTAGGGGAGC TGGACTACTT TAGTCTTCCT AAAGGACAAA ATAAACAGTA 192 0 
TTTTATGCAG GAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 19 8 0 
AAAAAA 1986 
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CGGACGCGTG GGCTGAGGCG GCGCTGTGTG TGTGAAGCGT ACCTAGGGCG GGAGGCGACA 60 

TGGAGACAGG GGCGGCCGAG CTGTATGACC AGGCCCTTTT GGGCATCCTG CAGCACGTGG 12 0 

15 GCAACGTCCA GGATTTCCTG CGCGTTCTCT TTGGCTTCCT CTACCGCAAG ACAGACTTCT 180 

ATCGCTTGCT GCGCCACCCA TCGGACCGCA TGGGCTTCCC GCCCGGGGCC GCGCAGGCCT 24 0 

TGGTGCTGCA GGTATTCAAA ACCTTTGACG ACATGGCCCG TCAGGATGAT GAGAAGAGAA 3 00 

GGCAGGAACT TGAAGAGAAA AT C AG AAGAA AGGAAGAGGA AGAGGC CAAG ACTGTGTCAG 3 60 

CTGCTGCAGC TGAGAAGGAG CCAGTCCCAG TTCCAGTCCA GGAAATAGAG ATTGACTCCA 42 0 

20 CCACAGAATT GGATGGGCAT CAGGAAGTAG AGAAAGTGCA GCCTCCAGGC CCTGTGAAGG 4 80 

AAATGGCCCA TGGTTCACAG GAGGCAGAAG CTCCAGGAGC AGTTGCTGGT GCTGCTGAAG 54 0 

TCCCTAGGGA ACCACCAATT CTTCCCAGGA TTCAGGAGCA GTTC CAGAAA AATCCCGACA 60 0 

GTTACAATGG TGCTGTCCGA GAGAACTACA CCTGGTCACA GGACTATACT GACCTGGAGG 660 

TCAGGGTGCC AGTAC C CAAG CACGTGGTGA AGGGAAAGCA GGTCTCAGTG GCCCTTAGCA 72 0 

25 GCAGCTCCAT TCGTGTGGCC ATGCTGGAGG AAAATGGGGA GCGCGTCCTC ATGGAAGGGA 78 0 

AGCTCACCCA CAAGATCAAC ACTGAGAGTT CTCTCTGGAG TCTCGAGCCC GGGAAGTGCG 84 0 

TTTTGGTGAA CCTGAGCAAG GTGGGCGAGT ATTGGTGGAA CGCCATCCTG GAGGGAGAAG 90 0 

AGCCCATCGA CATTGACAAG ATCAACAAGG AGCGCTCCAT GGCCACCGTG GATGAGGAGG 960 

AACAGGCGGT GTTGGACAGG CTTACCTTTG ACTACCACCA GAAGCTGCAG GGCAAGCCAC 102 0 

30 AGAGCCATGA GCTGAAAGTC CATGAGATGC TGAAGAAGGG GTGGGATGCT GAAGGTTCTC 108 0 

CCTTCCGAGG CCAGCGATTC GACCCTGCCA TGTTCAACAT CTCCCCGGGG GCTGTGCAGT 114 0 

TTTAATGACC AGAAGGAAAG GAAACCCTCG CCGGTGGGGA GGCAGAGCCT TATCCTCGGC 12 0 0 

TGCCCTTCTT GGCTCCCTGC ATTCCAGGGA CTTGCTCGTC TTGTTTACCC CTAGCCATCC 12 60 

TTTCTTTCAA GGGTGAAC C A GGCCTTCCAC CCTGACCTTG CATCTCCAGA CTGTTCCAGA 132 0 

35 GAAGGTGCGG GGCCAGCTGC TATGTGGTGG CCGCTGTGGC TGACACTGAG TGAAGGTGTT 13 8 0 

TGAAATGCAG GAGAGGATAT CCCAGCAAAT TGGGATCACA TGCTTTTGTC TCCACAGCAA 144 0 

CCAGCCACTG CAGGCAGCAT GTCTTTCCTC CCCTGCTCTC TGCTTGCTGT TGTTTTGACG 15 0 0 

CTATTCTGCT TGCATGTCTT CTGGTTGGGA TGTGGAGTTG TTGCTGGACT CTCAGGCGAA 15 6 0 

GCTGAAGTCA TTGAAGTGTG TGAAGCTCTG TGCTTGCATG AGGGCAAGCA AGGAATGGCT 162 0 

40 GTGCCTGAGG CTGCTCTGGG AAACTCCTTG CCCCTTGACC TCTTTTGAGA GCATTCACGT 168 0 

GGTCTTCTTG CTCATCCCCT TATAAATGTG CTTTGCCTGC CTCAGCCTCA TGGTCAGAGC 174 0 

AGTGGAGACT GGAGCCCTGT TTGCACGTTC TAGTTGTTCG GAGAAAGCCT AGGTTCTGGG 180 0 

CTCAGGTCCA GATGCAGCGG GGATTCTGTT CTCTGACTGT GGCGACCTTG CTTTGGTTCT i860 

TGTTGAAGTG AACCAAGCCC GGCCACCACG CATGGCATGC TGTGCTTGGC TCCCCATAAG 192 0 

45 ACGTCCTCTT TGGGTGCACG GTGTCAAAGT GTGGGCAGGA GTGGAGAGCT GGTGCCCTCA 198 0 

GGAGGAGACC ACAGCATGTC CATCAGCTCA GCAGAGCTCG ACAGCCACAA GTCCTGAGAA 2040 

GCTTTGACCT TGAAGGGCTT CTGGGAGAGG AGGAATTTCT GCATGGGGCG TGAAGGCACA 210 0 

CTGTCCCACC ACAACTGAAC CAGAAGAGAG TGAAGACTCC CCTCTTCCCA TCCTCTGTGC 2160 

CAGGTGCCAG ACTGTGCTCC TTGGAACTTA TGGCCCAATC TTACCTGTTC TCCAGGGACT 22 2 0 

50 GGTCACTGCC TCAGGACCCC CAAGCCTATG CCCTGAGCCA TGGCTGCTGA CTGACTCCAG 22 8 0 

C C AAGGTGC A AAGACGAGAT TATGAGACAG GTCCTCAGGC CTGTGTTCCA AGTACTCACA 2 34 0 

GGGGCTCTGG GTGCCCATCG CCGGGAGTAT GGTTCAGCTG CCACCGGCAC TGTCCATTTG 24 0 0 

CCTGTCTGTC AAGCTCAGAG CATGGATAAG CCACACAGCA GGGCAGTGCA CCCTGGCACC 24 6 0 

ATGCACGGCC AGCAAGAATC AAGGCCCGCA GATGCTAAGA GGGCCTATTG TCAGGGGAAG 2 52 0 

55 GTCCCCGCTC CTGCACACTC TCTATGGATA CTTGGGTTGT GGGGGCTCTC TTGGAGAGTA 25 8 0 

AGTTTGTGGT TTGTTTCTGG TTTACAGTGG TGGCTGACAC CCCTTGTAAG AAAGCATTCC 2 64 0 

TGGGAAGTCT TCTGTGGGTC CAAACATGTT GCTCCGATCA T C AC AGGAGA GCAAAAGGCC 2 70 0 

CTAGATACCC CCTTTGGAAT GTGAGAGTCT TGTTGTCTGA TATTTGCCAC TGAGCTGGTG 27 60 

AAGCCCCTCT AAAGAGATCT CGACCCTGGG GAGCAGAATT CTTGTCATCT ATGAGGGGTC 2 82 0 

60 CTGAGAAAGA CTTGTCATTT TTTTTCCTGG AGTTCTTCCC ATTGAGGTCC TAGGATTTGC 28 80 
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ACACCACTGT CCCACAAGAG CTTTCCTGCC TAATGAAAGG AGGTCTTGTG GTGTGTGTCT 2 94 0 

CCTCTCTTCT CTATAGTTCC CGAGTTGGCC CCCATTGCAG CCCCCACCCT GTGGGTAGTC 3 0 00 

TTCCAGAAGT GATGCAGTGG TGTGAGATGC CCTGCACCTT GTTATTTGGG AGACTTTGAG 3 0 60 

AGTCATTCAC TTCCATGGTG ACTAGTGTTT GTTTTGCCTG ATTTTATATT CTGTGTTGCA 312 0 

TTTCTCCCCA CTCCCTGCCC TGCTTTAATA AACAGCAAAC CAATATCTAG GAAGAATGAC 318 0 

TGAGGGATAG TATTGGGTAT TGGCCCCATG GCAGGAACAG CCACTTGCAT CTGGTCCCGG 3240 

TGCCACACTG CGGTGCTTGG TGTGGTTGTG GAGCCTGTCC CTGCGCGCCT TGCTCCCGTT 33 0 0 

GAGCCACGCT GTCTGGTGGG TGATTCTCTG CCCTGAGCCA CCACCCTGGA CTGGCCCAGT 33 60 

CTCCAGAGCT GGCACACCCT GCCTGTTTTC TCTTTTTAGA CACAACAGCC GCAGTTTGGC 342 0 

CAGCCACTAA GTCCCACCAG CTGAGGTCCG AGGAAAGCGG GGTGACTCAT TTCCCTTGTC 34 8 0 

CAGGGCCCGA GGAGAGTGAG GTGTCCAGCC TGCAAAGCTA TTCCAGCTCC TTGGTGTTGG 3 54 0 

TTTGCAATAA ATTGGTATTT AAGCAAAAAA AAAAAAAAAA AAAA 3 5 84 
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CGTGATCATG AGGGGTTGTG AAGTGCTTGC C C CAT C AGTA GCCATGTGTG CATGTGTAAA 6 0 

TACCATCCTC TGTGTGCCCT GGAGGCTGTC CTT CAGATAG CATGTACAGG TGGCAGCATA 12 0 

GGGCCTGTCC CTACTGAGAG TGCAGGGAAC TCAGCACCGT CAACTCCTCG ACCCTGCAGG 18 0 

TCAGATTATC CTTGTAGAGG CCCCCTGGAT GGCAC CAAGA TCGGCCCTGG CAAGTAGGTG 24 0 

ACCCTGACTT CAGAGCCCTT GCCTGAGGGC CTGGCCTGGC AGCTCTGCTG TTAGAAGCAG 300 

GAGGTGTGCA GAGGGTGGGG AGCAGCCCAG CCTCTGTGAT CTTCTCCATG GCAGGATCTC 3 60 

C C AGCAGGTA GAGCAGAGCC GGAGCC AGGT GCAGGCCATT GGAGAGAAGG TCTCCTTGGC 42 0 

CCAGGCCAAG ATTGAGAAGA TCAAGGGCAG CAAGAAGGCC AT C AAGGTAG TCCCCATACC 4 80 

CCTGTGTCCT GAGGCTACTG GGCAGTCCCT CCATTTCCCC GTGCCTCTGA GGCTGCCCAG 54 0 

TCTCTGCCCT GCTGCCCACC TGTACCTTGA GCTTTCTTCT CGCCCAGGCT TCCAACTCCA 60 0 

CCCTCTCCTG CCAAGCAATC CTAGCCCTCT GAGCCTCTTG GGGCCCCCTC AGACTTGTCC 660 

CTGTGTCCAC AGGTGTTCTC CAGTGCCAAG TACCCTGCTC CAGGGCGCCT GCAGGAATAT 720 

GGCTCCATCT TCACGGGCGC CCAGGACCCT GGCCTGCAGA GACGCCCCCG CCACAGGATC 7 80 

CAGAGCAAGC ACCGCCCCCT GGACGAGCGG GCCCTGCAGG TCTGCTGGCC GCGCATATAG 840 

CCTGTCACAC AC C AGGAGGA CTGGATACTG GGGAGGAGCC GGGGCCACCA TAGGGTTCTG 90 0 

TCCCCCAGAG GAGGCTGACT GGGATGGGAT GGCAGCTGAT TAGGCCCAGC ACCAAATATT 960 

CACCATCCCT TGGCCATCCT GGCCCTCTCA GGAGAAGCTG AAGGACTTTC CTGTGTGCGT 102 0 

GAGCACCAAG CCGGAGCCCG AGGACGATGC AGAAGAGGGA CTTGGGGGTC TTCCCAGCAA 1080 

CATCAGCTCT GTCAGCTCCT TGCTGCTCTT CAACACCACC GAGAAC CTGT ATGGC CAGAG 114 0 

GGCAGGGCCG AGGGGTGTGG GCGGGAGGCC CGGCCTGGCT TAGTGGGGAC CCAGGGCATC 12 0 0 

AGACACAGGT ACAGCACATA GGCCAGGAGC CAGGGGGTGA CGGGTGGCTC GGCTCGGGAG 12 60 

GCCTGGGACC CCACAGTGCA CGCTGTGCCC CTGATGATGT GGGAGAGGAA CATGGGCTCA 13 2 0 

GGACAGCGGG TGTCAGCTTG CCTGACCCCC ATGTCGCCTC TGTAGGTAGA AGAAGTATGT 13 8 0 

CTTCCTGGAC CCCCTGGCTG GTGCTGTAAC AAAGACC CAT GTGATGCTGG GGGCAGAGAC 144 0 

AGAGGAGAAG CTGTTTGATG CCCCCTTGTC CAT C AGCAAG AGAGAG CAGC TGGAACAGCA 15 0 0 

GGTGGGAGGG GTGGGACAGA GGTGGAGACA GGTGCAGTGG CCCAGGGCCT TGCCAGAGCT 15 60 

CCTCTCCAGT CAAGGCTGTT GGGCCCCTTA TTCCACCCAT GGGAGGTGCA CACAAGGTCT 162 0 

TGTTGGCTGC CCCTGCAGGT CCCTGTCACC TCTCACATGT CCCTGCCTAA TCTTGCAGGT 168 0 

CCCAGAGAAC TACTTCTATG TGCCAGACCT GGGCCAGGTG CCTGAGATTG ATGTTCCATC 174 0 

CTACCTGCCT GACCTGCCCG GCATTGCCAA CGACCTCATG TACATTGCCG ACCTGGGCCC 18 0 0 

CGGCATTGCC CCCTCTGCCC CTGGCACCAT TCCAGAACTG CCCACCTTCC ACACTGAGGT 18 6 0 

AGCCGAGCCT CTCAAGACCT ACAAGATGGG GTACTAACAC CACCCCCACC GCCCCCACCA 192 0 

CCACCCCCAG CTCCTGAGGT GCTGGCCAGT GCACCCCCAC TCCCACCCTC AACCGCGGCC 198 0 

CCTGTAGGCC AAGGCGCCAG GCAGGACGAC AGCAGCAGCA GCGCGTCTCC TTCAGGTGGG 2 04 0 

AGCAGCTCTT TGAGGCCACC TGATTTCTGG CGTGCTCAGT GCACTCGGGT GGATTTTCTG 210 0 

TGGGTTTGTT AAGTGGTCAG AAATTCTCAA TTTTTTGAAT AGTTTCCATT TCAAATATCT 2160 

TGTTCTACTT GGTTCATAAA ATAGTGGTTT TCAAACTGTA GAGCTCTGGA CTTCTCACTT 222 0 

CTAGGGCAGA GGGAGCCTGA ACAAGTGAGG CTCTGGGTTC CCCATTCCTA ATTAAACCAA 22 8 0 

TGGAAAGAAG GGGTCTAATA ACAAACTACA G C AAC AC ATT TTTCATTTCA GCTTCACTGC 2 34 0 
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TGTGTCTCCC AGTGTAACCC TAGCATCCAG AAGTGGCACA AAACCCCTCT GCTGGCTCGT 24 0 0 

GTGTGCAACT GAGAC TGTC A GAGCATGGCT AGCTCAGGGG TCCAGCTCTG CAGGGTGGGG 2 4 60 

GCTAGAGAGG AAGCAGGGAG TATCTGCACA CAGGATGCCC GCGCTCAGGT GGTTGCAGAA 2 52 0 

GTCAGTGCCC AGGCCCCCAC ACACAGTCTC CAAAGGTCCG GCCTCCCCAG CGCAGGGCTC 2 58 0 

CTCGTTTGAG GGGAGGTGAC TTCCCTCCCA GCAGGCTCTT GGACACAGTA AGCTTCCCCA 2 64 0 

GCCCTGCCTG AGCAGCCTTT CCTCCTTGCC CTGTTCCCCA CCTCCCGGCT CCAGTCCAGG 2 70 0 

GAGCTCCCAG GGAAGTGGTT GACCCCTCCG GTGGCTGGCC ACTCTGCTAG AGTCCATCCG 2 760 

CCAAGCTGGG GGCATCGGCA AGGCCAAGCT GCGCAGCATG AAGGAGCGAA AGCTGGAGAA 2 82 0 

GCAGCAGCAG AAGGAGCAGG AGCAAGGTGA GCGGGCCCTG GAGCTTGCAG TCGGAGGGCC 2 88 0 

TTGGGCAAGA TCGCCTCCTC CCCTCCAGCC CTGAGTCCAC CGGGTGCTTT CTGCCCACCC 2 94 0 

CCTGCTCTTG CCAGCTGGCC CCTGCTTCCC CTAGGGCACA TGCTGGAAGC CCTGGGCCGC 3 0 00 

C AC C AGAGGT CCTCAGCCCT CCTGCCTGGG CTATGGCTCC TTCCTGGTTT GGGAGC CAT A 3 06 0 

GTGGAGCTTT CCTCTCTAAG CTCACCCAGC TCAAACTGAC AGGAGAATCT TCTTCGACTG 312 0 

CCAAGAGCGG TCCAAGGCAA TGGTCAGCCA CTGCAGCCTC CTGAGATATT TTTAGAGACT 318 0 

GGAC CTGAGG CCTCTGGAGG CTACTGATGA TGCCTGCTGT GAACGCAGAC ACTGGTGTGA 3 24 0 

TGCGATGCCT GCGCCTGCAG CGGCAGTGCC CTGGGCACTA TGGTTTTGAG CTTGTACCCA 33 00 

GCGCTGCTTT TGCCTTGCTC TGTGACCCCA GGCAAGCTGC CTCACCTCTC TGGGCCAGTT 33 60 

TCCCCATTGT ACAGTGGTGC TGCACACCCT GGCCCTGGCC CCGAGGTGGC TGGGAGGTGG 342 0 

CTCCTCAAAC AGCCGCTGTC TCATCAGTGC CCGGTGCTGG GTCAGGGATC GACTGAGGCT 34 8 0 

CTGAGCTAAC TGGGAAACAC AGTGGCCTTG GAGGGCTGGG GAGTGT CATG GGGGTGGGGA 3 54 0 

CAGGGAGTCA CCGGTCGCAT GTGACTGAAC TCTTCACCCC AGTCTGTGGC TTTCCCGTTG 3 60 0 

CAGTGAGAGC CACGAGCCAA GGTGGGCACT TGATGTCGGA TCTCTTCAAC AAGCTGGTCA 3 66 0 

TGAGGCGCAA GGGTAGGAGG CAGGGCCGCT GCCCGCCCTG GGCCAGCACC TTGTAATTCT 3 72 0 

GTCCTGCCTT TTTCTTCCTG TATTTAAGTC TCCGGGGGCT GGGGGAACCA GGGTTTCCCA 37 8 0 

CCAACCACCC TCACTCAGCC TTTTCCCTCC AGGCATCTCT GGGAAAGGAC CTGGGGCTGG 3 84 0 

TGAGGGGCCC GGAGGAGCCT TTGCCCGCGT GTCAGACTCC ATCCCTCCTC TGCCGCCACC 3 90 0 

GCAGCAGCCA CAGGCAGAGG AGGACGAGGA CGACTGGGAA TCGTAGGGGG CTCCATGACA 3 96 0 

CCTTCCCCCC CAGACCCAGA CTTGGGCCGT TGCTCTGACA TGGACACAGC CAGGACAAGC 4 02 0 

TGCTCAGACC TACTTCCTTG GGAGGGGGTG ACGGAAC CAG CACTGTGTGG AGACCAGCTT 4 08 0 

CAAGGAGCGG AAGGCTGGCT TGAGGCCACA CAGCTGGGGC GGGGACTTCT GTCTGCCTGT 414 0 

GCTCCATGGG GGGACGGCTC CACCCAGCCT GCGCCACTGT GTTCTTAAGA GGCTTCCAGA 42 0 0 

GAAAACGGCA CACCAATCAA TAAAGAACTG AGCAG 42 3 5 
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GTGCAAGCAT CTGAAGAGCT GCCGGGATGC AGCAGAGAGG AGCAGCTGGA AGCCGTGGCT 6 0 

GCGCTCTCTT CCCTCTGCTG GGCGTCCTGT TCTTCCAGGG TGTTTATATC GTCTTTTCCT 12 0 

TGGAGATTCG TGCAGATGCC CATGTCCGAG GTTATGTTGG AGAAAAGATC AAGTTGAAAT 18 0 

GC AC TTTCAA GTCAACTTCA GATGTCACTG ACAAACTTAC TATAGACTGG ACATATCGCC 24 0 

CTCCCAGCAG CAGCCACACA GTATCAATAT TTCATTATCA GTCTTTCCAG TACCCAACCA 300 

CAGCAGGCAC ATTTCGGGAT CGGATTTCCT GGGTTGGAAA TGTATACAAA GGGGATGCAT 3 60 

CTATAAGTAT AAGCAACCCT ACCATAAAGG ACAATGGGAC ATTCAGCTGT GCTGTGAAGA 42 0 

ATCCCCCAGA TGTGCATCAT AATATTCCCA TGACAGAGCT AACAGTCACA GAAAGGGGTT 4 80 

TTGGCACCAT GCTTTCCTCT GTGGCCCTTC TTTCCATCCT TGTCTTTGTG CCCTCAGCCG 54 0 

TGGTGGTTGC TCTGCTGCTG GTGAGAATGG GGAGGAAGGC TGCTGGGCTG AAGAAGAGGA 60 0 

GCAGGTCTGG CTATAAGAAG TCATCTATTG AGGTTTCCGA TGACACTGAT CAGGAGGAGG 660 

AAGAGGCGTG TATGGCGAGG CTTTGTGTCC GTTGCGCTGA GTGCCTGGAT TCAGACTATG 72 0 

AAGAGACATA TTGATGAAAG TCTGTATGAC ACAAGAAGAG TCAC CTAAAG ACAGGAAACA 78 0 

TCCCATTCCA CTGGCAGCTA AAGCCTGTCA GAGAAAGTGG AGCTGGCCTG GACCATAGCG 84 0 

ATGGACAATC CTGGAGATCA T C AGTAAAG A CTTTAGGAAC CACTTATTTA TTGAATAAAT 9 00 

GTTCTTGTTG TATTTATAAA CTGTTCAGGA ACTCTCATAA GAGACTCATG ACTTCCCCTT 960 

TCAATGAATT ATGCTGTAAT TGAATGAAGA AATTCTTTTC CTGAGCAAAA AGATACTTTT 102 0 

TGATTCATCT TTGCTCTGGA ATGTATTACA TGTTTTCTTC CAACTGTTTG AAGGAGAATT 10 8 0 

TTGAATGTTT GCCACACCGC TGATACCCAA ATAATTTTTT AAATGAAGTG GAGCTTGTGG 114 0 
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CTTCCTGATG TGTCACCAGA CAAAATATTC GCTTGGGATA TGTATTCTTT GTTTTTTGCT 12 0 0 

CCATGTACAC TTTCAGCTGT GAGTTAGTAT AGGGCGTATA CTTACCGGTT TAATGACCTC 12 60 

AACCTCAGTT GTGTTTGGAT AACTTAGGGT GTATACCCTT AGTTTCCTTA GAGTTGGTAG 132 0 

GAT CAAGTC A TTGGTTTGCT TTGACTGGGT TTTTAAAGTA TTAAGTACAG TGTCATCAAT 13 80 

TTACAGTTAA GGAAAGGAAT CGTGAAGTAG AAAAATTATT TTCTTTAGTC TTGCTGGTAC 144 0 

AATTTGGGCT AAGGAGTCTT TGTTATTTTC TGTCTTGCTT TTTTTTTTTT TTTTTTTTTT 15 00 

TTGAGGCAGA GTCTCACTCT GTCGCCAGGC TGGAGTGCAG TGGTGTGATC TTGGCTCACT 15 60 

GCAACCTCTG CCTCCTGGGT TCAAGCGATT CTTGTGCCTC AGCCTCTCGA GTAGCTGGGA 162 0 

TTACAGGCAT GCGCCACCAC ACCCAGCTAA TTTTTGTGTT TTTAGTAGAG ACGGGGTTTC 168 0 

ACCATTTTGG CCAGGATGGT CTCAATCCCC TGACCTCGTG ATCCACCTGC CTCGGCCTCC 174 0 

CAAAGTGTTG GGATTACAGG CATGAGCCAC TGTGCTTGGC CTGTTATTTT ATTTTCTTAT 180 0 

AACTACAACT TTTCTTCTTG AATTTTCAGG TCAGAGGCAA GAAAAACTCT TTACAGGTTT 18 60 

TTAGTGGGGG GCTTATGGAG TATTTCAGGA GTTCTTTGCA AATTAAAT C A TCTTTTCACT 192 0 

TGTATTGTTT TTCAAAACTT TGTTGATTTC TAAAATGTGC CAACTGTGAG TAAACTATGG 198 0 

TATTTGCAAG TGGTTTTTAC ATAATATTTG AGATGAGGAA GTGAGATTGT GCATGACATA 2 04 0 

CTTCTCCTTT GTATTCTCTC AGTGCCTTAC AGCAGGTTAC TCCATTCTGC TATGACAACT 210 0 

TGTTTCAAAT GTTAATTTAC ATAGGATTTT TTATAAGCCA TTAAGGCATA TGTATAGTAT 2160 

ATCAGTAAAG ATGGATGGTG CATATATAAA TAGTCTTCTG TAATAGTGAT TGGATTTACT 222 0 

TCTCAATTAT GAGAGACAAA AATTATCCCC TCACCTGTCT CTATTCTTTC AACAGGTTGA 22 8 0 

TGCCTTTTCA TGATTTTTCA TTAGGTGGTT CAGGAAGTTT CCATATTACA GGGCTTCAGA 2 34 0 

CTGTATATGT TAGTTTAAAA ATCACTTTTC TCTCTCTCAA CTTCTTTCTT TTTTTTTTGA 2 40 0 

AGACTTAATT TAAAAAATTT GGGTTGTTAG ATCCGTATGA TAGATTTGGC CTAGCCTCTT 24 6 0 

CTGTTAACCT AGTCCACAGA TGAGCGAATC TGGTTAGTTG AAGGACATTG TGATTTGACT 2 52 0 

CTGGTCACGC GAGGAAGTAG AAGGG CAAAG AC AGGAC CGG CAGTTTACAT TTCCAGTGGT 25 8 0 

TAAACCTCAC GGTACTTTGG GACTGCTTGT TAACTTTTGT GGTTGTCTGA GGCCAATCTA 2 64 0 

ACGTGAC CAT TTCTGACACC TCAACAGAGA GAGGAAAG C A ACTTGAGCAA TGAGAGTAAA 2 70 0 

TAACTTGGGC TCTCAGAGAT TTGAAGATAG AGATCTCATT GTGAGGGGGA CTATTTTGCA 2 76 0 

GGTCCTCATT TCTCCAAGAA AGAGATGGTG TTACAGGAAC CCACTGAAAG CCATATCCCA 2 82 0 

TTAAATGAGG AACTAATTTT GGCTGGGCCT TCTTGTAATG TCCTCGCAGG TGTGTTGTGA 2 88 0 

AGATTAATGC AGGGTAGTAT GTTTGTAGAT TGACACCTAG TCTAAACTTG AGGTAATTGG 2 94 0 

TGCTCTGTGA ATACTCAGTC GTGTTCTTTT ATAGCCTTAA TCATGATTTG AACTAGTCCC 30 0 0 

TTGCTTTTTA AATGACTGAA TGAAGTCCTT CGTGGTAAGG GAGTACGTTG ATAAC TTAGT 3 0 60 

TTACTATATG GGTTTGTGGT CGCATCCCAG TCATCAGCTG CTATCATTTT CCTTCTTCAT 312 0 

CCCTTATACT GAGATTTGGG TTACAGCTTT TTATTCTTCG AAGGAT C AC A AAGCAGTGTA 318 0 

CAGACACCTG CCTTCTTTAA GGATGAAAGG AAGATAAAGT GGTCTTTTTT TGTTTACTTA 3 24 0 

TTTGTTTCAC CTCTTGTTTG AGTAACTTCT AAGGTGCTAT TCTCTCTCTC TTTTTGCTAC ^3 00 

CTCATGAGCT CTTGTCACAG CCATGGAAAC CAGCCTCGTT TAGAAAGGGA ACTTAGTTCA 33 60 

GAAGGGGTTA AAAGCCTTCC AGAATTTTTC TTTAGCTGCT GAAGTTTTTA CATGTGGTTA 3 42 0 

CATGACTTTA AGTTTTATGC ATTACGCTCT TAATTCTATT ACAAAATGTG GACTCACCAA 34 8 0 

TTGCTTTGTG TTTTCCATGT GACCTGTTAC TTCAGGCTAC TTGGGGAACA TCTTAGTCCT 354 0 

CTGTAGCTCC TGAACCCAGC ACTGGTGCTT CAAGAGAGAA GGTAGCACGT CTTTGTTCAA 3 60 0 

AACAAAACAA AACGACACTT CTGGAGGCCA CAT C C TGAAT ATGAATGTTC TACTAAGTCA 3 66 0 

CTCAGTTATG GTTCTAAAGG GAAACTGTAA GAAGACCCAC AAGGAGTGGA CCAAGACTAT 3 72 0 

TATTTAATTG CACAACTTGA AACTTTGCTG CCAGAAGAGG CAGCTCCATT CCTTTGACTC 3 78 0 

CAGTGTTGGG CTGTTAACTG CTGCACCTCA TTGCCTTTTT TTGTTTTTGT TTTTGTTTTG 3 84 0 

TAGGAGGGTA GGCACTGTTG GGCCATATGC ACAAATATTG TAACTCTTGG TATCTTTACT 3 90 0 

GCATCATAGT CAATAAACTT CTTTGTACCC TT 3332 
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GGCACGAGGC GGGCCAGCGA CGGGCAGGAC GCCCCGTTCG CCTAGCGCGT. GCTCAGGAGT 6 0 

TGGTGTCCTG CCTGCGCTCA GGATGAGGGG GAATCTGGCC CTGGTGGGCG TTCTAATCAG 12 0 

CCTGGCCTTC CTGTCACTGC TGCCATCTGG ACATCCTCAG CCGGCTGGCG ATGACGCCTG 18 0 

CTCTGTGCAG ATCCTCGTCC CTGGCCTCAA AGGGGATGCG GGAGAGAAGG GAGACAAAGG 24 0 
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CGCCGCCGGA CGGCCTGGAA GAGTCGGCCC CACGGGAGAA AAAGGAGACA TGGGGGACAA 300 

AGGACAGAAA GGCAGTGTGG GTCGTCATGG AAAAATTGGT CCCATTGGCT CTAAAGGTGA 3 60 

GAAAGGAGAT TCCGGTGACA TAGGACCCCC TGGTCCTAAT GGAGAAC C AG GCCTCCCATG 42 0 

TGAGTGCAGC CAGCTGCGCA AGGCCATCGG GGAGATGGAC AACCAGGTCT CTCAGCTGAC 48 0 

CAGCGAGCTC AAGTTCATCA AGAATGCTGT CGCCGGTGTG CGCGAGACGG AGAGCAAGAT 54 0 

CTACCTGCTG GTGAAGGAGG AGAAGCGCTA CGCGGACGCC CAGCTGTCCT GCCAGGGCCG 60 0 

CGGGGGCAGG CTGAGCATGC CCAAGGACGA GGCTGCCAAT GGCCTGATGG CCGCATACCT 660 

GGCGCAAGCC GGCCTGGCCC GTGTCTTCAT CGGCATCAAC GACCTGGAGA AGGAGGGCGC 72 0 

CTTCGTGTAC TCTGACCACT CCCCCATGCG GACCTTCAAC AAGTGGCGCA GCGGTGAGCC 7 80 

CAAGAATGCC TACGACGAGG AGGACTGCGT GGAGATGGTG GCCTCGGGCG GCTGGAACGA 84 0 

CGTGGCCTGC CACACCACCA TGTACTTCAT GTGTGAGTTT GACAAGGAGA ACATGTGAGC 90 0 

CTCAGGCTGG GGCTGCCCAT TGGGGGCCCC ACATGTCCCT GCAGGGTTGG CAGGGACAGA 960 

GCCCAGACCA TGGTGCCAGC CAGGGAGCTG TCCCTCTGTG AAGGGTGGAG GCTCACTGAG 102 0 

TAGAGGG CTG TTGTCTAAAC TGAGAAAATG GCCTATGCTT AAGAGGAAAA TGAAAGTGTT 10 8 0 

CCTGGGGTGC TGTCTCTGAA GAAGCAGAGT TTCATTACCT GTATTGTAGC CCCAATGTCA 114 0 

TTATGTAATT ATTACCCAGA ATTGCTCTTC CATAAAGCTT GTGCCTTTGT CCAAGCTATA 12 0 0 

CAATAAAATC TTTAAGTAGT GCAGTAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAA 125 7 
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CAATGCTACA TTAACCCATT ATGTAAGACC AATAAATGCA GAGCCAGCGT TTCAAGCACA 60 

GGAAATACCA GCAGGCAGAA TGGCCAGTTT GCTTAAGAAT GGTGAGCCTG AAGCTGAGTT 12 0 

ACATAAAGAA ACCACAGGTC CAGGCACTGC TGGCCCTCAG TCCAACACCA CATCTTCTCT 180 

AAAAGGTGAA CGCAAAGGCA TGCACAGGCT GCAAGATGTG TCAACATGTG AAACAAAGGA 24 0 

GCTATTGAAT GTCGGGGTTT CCTCCCTTTG TGCTGGTCCC TACCAAAATA CAGCAGACAC 3 00 

CAAGGAAAAC CTCAGTAAAG AGCCTTTGGC CTCCTTTGTT TCAGAATCCT TTGATACTTC 360 

TGTTTGTGGA ATAGCCACAG AGCACGTAGA AATTGAGAAC AGTGGGGAGG GGCTCAGGGC 42 0 

TGAGGCTGGT TCTGAAACCC TAGGCAGAGA TGGAGAGGTC GGTGTGAATT CCGACATGCA 480 

CTATGAACTC TCTGGAGATT CTGATCTAGA CCTGCTTGGT GATTGTAGAA ATCCCAGACT 540 

GGATTTGGAG GATTCTTATA CTTTAAGAGG TAGTTACACC AGGAAAAAAG ATGTTCCCAC 60 0 

AGATGGGTAT GAGTCGTCGT TGAACTTCCA CAACAACAAC CAAGAGGACT GGGGCTGCTC 660 

TAGCCGGGTT CCAGGCATGG AGACGAGCCT CCCTCCCGGG CACTGGACTG CTGCGGTAAA 720 

GAAAGAAGAG AAGTGTGTGC CGCCTTACGT CCAAATCCGA GATCTCCACG GGATCCTCAG 78 0 

GACTTACGCC AACTTCTCTA TAACAAAAGA ACTCAAAGAT AC C ATGAGAA CTTCACACGG 84 0 

CCTGAGGAGG CACCCGAGTT TCAGTGCAAA CTGTGGCCTG CCCAGCTCCT GGACAAGCAC 900 

TTGGCAGGTG GCAGACGACC TCACCCAGAA CACTTTAGAC CTGGAGTATC TGCGTTTTGC 960 

ACATAAACTA AAACAGACCA TAAAGAATGG GGATTCTCAG CATTCTGCCT CCTCTGCCAA 1020 

TGTCTTTCCA AAGGAGTCAC CAACCCAGAT CTCCATTGGT GCTTTCCCTT CGACAAAAAT 1080 

CTCTGAGGCC CCATTTCTGG ATCCTGCACC TAGGAGCAGA AGCCCCCTTC TGGTAACAGC 1140 

TGTGGAGTCA GAT C C C AG AC CACAGGGACA GCCCAGGAGA GGCTACACAG CCAGCAGTCT 1200 

GGAGATCTCT TCCTCTTGGA GAGAGAGATG TAGTCATAAT AG AG AT CTT A GAAATTCTCA 1260 

AAGAAATCAC ACTGTTTCAT TCCACCTCAA CAAACTGAAA TACAACAGTA CTGTGAAGGA 132 0 

ATCTCGGAAT GATATTTCAC TTATTCTCAA TGAGTATGCT GAATTCAACA AGGTGATGAA 13 8 0 

GAATAGCAAC CAATTCATTT TCCAAGACAA AGAGCTAAAT GATGTTTCTG GAG AAGC C AC 144 0 

TGCTCAAGAG ATGTATCTGC CTTTCCCAGG ACGGTCAGCC TCCTATGAAG ACATAATCAT 150 0 

AGACGTGTGC ACCAATTTGC ACGTCAAACT AAGAAGTGTT GTGAAAGAGG CTTGTAAAAG 156 0 

TACCTTCCTG TTCTACCTTG TCGAAACAGA AG AC AAAT C A TTCTTTGTAA GAACAAAGAA 162 0 

CCTTCTGAGG AAAGGAGGCC ATACAGAAAT TGAACCTCAG CACTTCTGTC AAGCTTTCCA 16 8 0 

CAGAGAGAAT GATACACTAA TCATCATCAT CAGAAATGAA GATATATCAT CACATTTGCA 174 0 

TCAGATTCCT TCTTTGCTGA AGCTGAAGCA TTTCCCCAGT GTCATCTTTG CTGGAGTAGA 1800 

CAGCCCTGGA GATGTTCTTG ATCACAC CTA CCAAGAACTG TTTCGTGCAG GAGGCTTTGT 18 6 0 

GATATCAGAT GACAAGATAC TAGAAGCTGT AACATTAGTT CAACTGAAGG AAATTATCAA 192 0 

AATCCTGGAA AAACTAAATG GAAATGGAAG ATGGAAGTGG TTGCTTCACT ACAGGGAAAA 198 0 

TAAAAAGCTA AAAGAAGATG AAAGAGTGGA TTCAACTGCA CATAAGAAGA ACATAATGTT 2 04 0 
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GAAGTCATTT CAGAGTGCAA ATATCATTGA ATTGCTTCAT TATCAC CAGT GTGACTCTCG 210 0 

AT CAT CAACA AAAGCAGAAA TTCTGAAATG TTTGCTAAAC CTG'CAAATTC AGCATATTGA 216 0 

TGCCAGGTTT GGTGTCCTCC TAACAGACAA GCCTACTATC CCCAGAGAAG TCTTTGAAAA 222 0 

TAGTGGAATC CTTGTTACAG ATGTAAATAA CTTTATAGAA AACATAGAAA AAATAGCAGC 22 8 0 

TCCATTTAGG AGTAGCTATT GGTGACTCAA CTACAGCCTG CCTGGATATG GATGATGC C A 2340 

ATAAAAAATT AGTATTTTCC CTTTGGAAAA CTTGTGAAGA TGTGAATACA CATGTGAAGT 24 0 0 

CTTACATTTG AAAAACCAAT GTTCTACAAC TTGGAAAGTT TTCATTTTTT ATATTTTGCT 24 60 

GAAATATGTC ACAGTGGCAT TGCAGTTGTC TGTTAGCTTT GGGTTGCAGT GCTAGATATT 2 52 0 

GTTTTAAATT ATTTTCATTT TAAACAAGAT GCCTTCTAAG CTATTGAGCT TATTAAAAAT 25 8 0 

AATTTTACAT GTTTACTTAG TTGGAGCAAA AATAAGTCTA TTTTAAGGAA TAGCTTTGTT 2 64 0 

TTTGCTATGC TAATGTCTAG AAAGGCATAC GATGCTACTA TTATGCTCTG TTTTAAAGGT 2 70 0 

TTTACCTACC CTTGTAAAAA CTATAATCTT AAATGGTTTT ATTTGCTGTT TACTACTTAT 2 760 

ACATACTACT ACTATAAAAC TATTTTTTCC TAAATGGTAC AAATTTATAA ACTATCATTT 2 82 0 

TTCACTTACG GTATTTGTAA ATACTACTAC TACAAAAATC AGCTTTCCGA GAAAGAAATA 2 8 80 

ATCATTTATT TATGATATTG AAAATTTCTA CAGTAAACAC TCAAAACCAA GCAAAAAACA 2 94 0 

TTTGTAAGAT ACACGGTATC TATTTGGAGC AACGGTTTTT GTAACTAATG TGTTTCATTT 3 0 00 

TTTAAATAAA GACAACTAAA AATAAAAAAA AAAAAAAAAA A 3 041 
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TGCCCAGGAG GAGTAGGAGC AGGAGCAGAA GCAGAAGCGG GGTCCGGAGC TGCGCGCCTA 60 

CGCGGGACCT GTGTCCGAAA TGCCGGTGCG AGGAGACCGC GGGTTTCCAC CCCGGCGGGA 12 0 

GCTGTCAGGT TGGCTCCGCG CCCCAGGCAT GGAAGAGCTG ATATGGGAAC AGTACACTGT 18 0 

GACCCTACAA AAGGATTCCA AAAGAGGATT TGGAATTGCA GTGTCCGGAG G C AG AG AC AA 24 0 

CCCCCACTTT GAAAATGGAG AAACGTCAAT TGTCATTTCT GATGTGCTCC CGGGTGGGCC 3 00 

TGCTGATGGG CTGCTCCAAG AAAATGACAG AGTGGTCATG GTCAATGGCA CCCCCATGGA 360 

GGATGTGCTT CATTCGTTTG CAGTTCAGCA GCTCAGAAAA AGTGGGAAGG TCGCTGCTAT 420 

TGTGGTCAAG AGGCCCCGGA AGGTCCAGGT GGCCGCACTT CAGGCCAGCC CTCCCCTGGA 4 80 

TCAGGATGAC CGGGCTTTTG AGGTGATGGA CGAGTTTGAT GGCAGAAGTT TCCGGAGTGG 540 

CTACAGCGAG AGGAGCCGGC TGAACAGCCA TGGGGGGCGC AGCCGCAGCT GGGAGGACAG 600 

CCCGGAAAGG GGGCGTCCCC ATGAGCGGGC CCGGAGCCGG GAGCGGGACC TCAGCCGGGA 660 

CCGGAGCCGT GGCCGGAGCC TGGAGCGGGG CCTGGACCAA GACCATGCGC GCACCCGAGA 720 

CCGCAGCCGT GGCCGGAGCC TGGAGCGGGG CCTGGACCAC GACTTTGGGC CATCCCGGGA 780 

CCGGGACCGT GACCGCAGCC GCGGC CGGAG CATTGACCAG GACTACGAGC GAGCCTATCA 840 

CCGGGCCTAC GACCCAGACT ACGAGCGGGC CTACAGCCCG GAGTACAGGC GCGGGGCCCG 90 0 

CCACGATGCC CGCTCTCGGG GACCCCGAAG CCGCAGCCGC GAGCACCCGC ACTCACGGAG 960 

CCCCAGCCCC GAGCCTAGGG GGCGGCCGGG GCCCATCGGG GTCCTCCTGA TGAAAAGCAG 1020 

AGCGAACGAA GAGTATGGTC TCCGGCTTGG GAGTCAGATC TTCGTAAAGG AAATGACCCG 1080 

AACGGGTCTG GCAACTAAAG ATGGCAACCT TCACGAAGGA GACATAATTC TCAAGATCAA 114 0 

TGGGACTGTA ACTGAGAACA TGTCTTTAAC GGATGCTCGA AAATTGATAG AAAAGTCAAG 12 0 0 

AGGAAAACTA CAGCTAGTGG TGTTGAGAGA CAGC CAGC AG ACCCTCATCA ACATCCCGTC 1260 

ATTAAATGAC AGTGACTCAG AAATAGAAGA TATTTCAGAA ATAGAGTCAA CCCGATCATT 1320 

TTCTCCAGAG GAGAGACGTC ATCAGTATTC TGATTATGAT TATCATTCCT CAAGTGAGAA 13 80 

GCTGAAGGAA AGGCCAAGTT C C AG AGAGG A CACGCCGAGC AGATTGTCCA GGATGGGTGC 144 0 

GACACCCACT CCCTTTAAGT C C AC AGGGGA TATTGCAGGC ACAGTTGTCC CAGAGAC C AA 1500 

CAAGGAACCC AG AT AC C AAG AGGAACCCCC AGCTCCTCAA C CAAAAGC AG CCCCGAGAAC 15 60 

TTTTCTTCGT CCTAGTCCTG AAGATGAAGC AATATATGGC C CTAATAC C A AAATGGTAAG 162 0 

GTT CAAGAAG GGAGACAGCG TGGGCCTCCG GTTGGCTGGT GGCAATGATG TCGGGATATT 16 8 0 

TGTTGCTGGC ATT CAAGAAG GGACCTCGGC GGAGCAGGAG GGCCTTCAAG AAGGAGACCA 1740 

GATTCTGAAG GTGAACACAC AGGATTT CAG AGGATTAGTG CGGGAGGATG CCGTTCTCTA 18 0 0 

CCTGTTAGAA ATCCCTAAAG GTGAAATGGT GACCATTTTA GCTCAGAGCC GAGCCGATGT 18 60 

GTATAGAGAC ATCCTGGCTT GTGGCAGAGG GGATTCGTTT TTTATAAGAA GCCACTTTGA 192 0 

ATGTGAGAAG GAAACTCCAC AGAGCCTGGC CTTCACCAGA GGGGAGGTCT TCCGAGTGGT 19 80 

AGACACACTG TATGACGGCA AGCTGGGCAA CTGGCTGGCT GTGAGGATTG GGAACGAGTT 2 04 0 
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GGAGAAAGGC TTAATCCCCA ACAAGAGCAG AGCTGAACAA ATGGCCAGTG TTCAAAATGC 210 0 

C C AG AG AG AC AACGCTGGGG ACCGGGCAGA TTTCTGGAGA ATGCGTGGCC AGAGGTCTGG 2160 

GGTGAAGAAG AACCTGAGGA AAAGTCGGGA AGACCTCACA GCTGTTGTGT CTGTCAGCAC 222 0 

CAAGTTCCCA GCTTATGAGA GGGTTTTGCT GCGAGAAGCT GGTTTCAAGA GACCTGTGGT 22 8 0 

CTTATTCGGC CCCATAGCTG ATATAGCAAT GGAAAAATTG GCTAATGAGT TACCTGACTG 2 34 0 

GTTTCAAACT GCTAAAACGG AACCAAAAGA TGCAGGATCT GAGAAATCCA CTGGAGTGGT 2 40 0 

CCGGTTAAAT ACCGTGAGGC AAGTTATTGA ACAGGATAAG CATGCACTAG TGGATGTGAC 246 0 

TCCGAAAGCT GTGGAC CTGT TGAATTACAC CCAGTGGTTC TCAATTGTGA TTTCTTTCAC 2 52 0 

GCCAGACTCC AGACAAGGTG TCAACACCAT GAGACAAAGG TTAGACCCAA CGTCCAACAA 2 58 0 

TAGTTCTCGA AAGTTATTTG ATCACGCCAA CAAGCTTAAA AAAACGTGTG CACACCTTTT 2 64 0 

TACAGCTACA ATCAAC CTAA ATTCAGCCAA TGATAGCTGG TTTGGCAGCT TAAAGGACAC 2 70 0 

TATTCAGCAT CAGCAAGGAG AAGCGGTTTG GGTCTCTGAA GGAAAGATGG AAGGGATGGA 2 7 60 

TGATGACCCC GAAGACCGCA TGTCCTACTT AACTGCCATG GGCGCAGACT ATCTGAGTTG 2 82 0 

CGACAGCCGC CTCATCAGTG ACTTTGAAGA C AC GGACGGT GAAGGAGGCG CCTACACTGA 2 880 

CAATGAGCTG GATGAGCCAG CCGAGGAGCC GCTGGTGTCG TCCATCACCC GCTCCTCGGA 2 940 

GCCGGTGCAG CACGAGGAGA GCATAAGGAA ACCCAGCCCA GAGCCACGAG CTCAGATGAG 3 0 00 

GAGGGCTGCT AGCAGCGATC AACTTAGGGA CAATAGCCCG CCCCCAGCAT TCAAGCCAGA 3 0 60 

GCCGTCCAAG GCCAAAACCC AGAACAAAGA AGAATC CTAT GACTTCTCCA AATCCTATGA 312 0 

ATATAAGTCA AACCCCTCTG CCGTTGCTGG TAATGAAACT CCTGGGGCAT CTACCAAAGG 318 0 

TTATCCTCCT CCTGTTGCAG CAAAAC CTAC CTTTGGGCGG TCTATACTGA AGCCCTCCAC 3 24 0 

TCCCATCCCT CCTCAAGAGG GTGAGGAGGT GGGAGAGAGC AGTGAGGAGC AAGATAATGC 3 3 00 

TCCCAAATCA GTCCTGGGCA AAGTCAAAAT ATTTGGAGAA GATGGATCAC AAGGGCCAGG 33 60 

GTTACAAGAG AATGCAGGAG CTCCAGGAAG CACAGAATGC AAGGATCGAA ATTGCCCAGA 342 0 

AGCATCCTGA TATCTATGCA GTTCCAATCA AAACGCACAA GCCAGACCCT GGCACGCCCC 34 80 

AGCACACGAG TTCCAGACCC CCTGAGCCAC AGAAAGCTCC TTCCAGACCT TAT C AGGAT A 3 54 0 

CCAGAGGAAG TTATGGCAGT GATGCCGAGG AGGAGGAGTA CCGCCAGCAG CTGTCAGAAC 3 60 0 

ACTCCAAGCG CGGTTACTAT GGCCAGTCTG CCCGATACCG GGACACAGAA TTATAGATGT 3660 

CTGAGCACGG ACTCTCCCAG GCCTGCCTGC ATGGCATCAG ACTAGCCACT CCTGCCAGGC 3720 

CGCCGGGATG GTTCTTCTCC AGTTAGAATG C AC C ATGGAG ACGTGGTGGG ACTCCAGCTC 3 78 0 

GTGTGTCCTC ATGGAGAACC CAGGGGACAG CTGGTGCAAA TTCAGAACTG AGGGCTCTGT 3 84 0 

TTGTGGGACT GGGTTAGAGG AGTCTGTGGC TTTTTGTTCA GAATTAAGCA GAACACTGCA 3 90 0 

GTCAGATCCT GTTACTTGCT TCAGTGGACC GAAATCTGTA TTCTGTTTGC GTACTTGTAA 3960 

TATGTATATT AAGAAGCAAT AACTATTTTT CCTCATTAAT AGCTGCCTTC AAGGACTGTT 4 02 0 

TCAGTGTGAG TCAGAATGTG AAAAAGGAAT AAAAAATACT GTTGGGCTCA AACTAAATTC 408 0 

AAAGAAGTAC TTTATTGCAA CTCTTTTAAG TGCCTTGGAT G AG AAGTGT C TTAAATTTTC 414 0 

TTCCTTTGAA GCTTTAGGCA GAGC CATAAT GGAC T AAAAC ATTTTGACTA AGTTTTTATA 4200 

CCAGCTTAAT AGCTGTAGTT TTCCCTGCAC TGTGTCATCT TTTCAAGGCA TTTGTCTTTG 4260 

TAATATTTTC CATAAATTTG GACTGTCTAT AT CATAACTA TACTTGATAG TTTGGCTATA 4320 

AGTGCTCAAT AGCTTGAAGC CCAAGAAGTT GGTATCGAAA TTTGTTGTTT GTTTAAACCC 4380 

AAGTGCTGCA CAAAAGCAGA TACTTGAGGA AAACACTATT TCCAAAAGCA CATGTATTGA 444 0 

CAACAGTTTT ATAATTTAAT AAAAAGGAAT ACATTGCAAT CCGT 44 84 
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TGCCCAGGAG GAGTAGGAGC AGGAGCAGAA GCAGAAGCGG GGTCCGGAGC TGCGCGCCTA 60 

CGCGGGACCT GTGTCCGAAA TGCCGGTGCG AGGAGACCGC GGGTTTCCAC CCCGGCGGGA 12 0 

GCTGTCAGGT TGGCTCCGCG CCCCAGGCAT GGAAGAGCTG ATATGGGAAC AGTACACTGT 18 0 

GACCCTACAA AAGGATTCCA AAAGAGGATT TGGAATTGCA GTGTCCGGAG GCAGAGACAA 24 0 

CCCCCACTTT GAAAATGGAG AAACGTCAAT TGTCATTTCT GATGTGCTCC CGGGTGGGCC 3 00 

TGCTGATGGG CTGCTCCAAG AAAATGACAG AGTGGTCATG GTCAATGGCA CCCCCATGGA 3 60 

GGATGTGCTT CATTCGTTTG CAGTTCAGCA GCTCAGAAAA AGTGGGAAGG TCGCTGCTAT 42 0 

TGTGGTCAAG AGGCCCCGGA AGGTCCAGGT GGCCGCACTT CAGGCCAGCC CTCCCCTGGA 48 0 

TCAGGATGAC CGGGCTTTTG AGGTGATGGA CGAGTTTGAT GGCAGAAGTT TCCGGAGTGG 54 0 
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CTACAGCGAG AGGAGCCGGC TGAACAGCCA TGGGGGGCGC AGCCGCAGCT GGGAGGACAG 60 0 

CCCGGAAAGG GGGCGTCCCC ATGAGCGGGC CCGGAGCCGG GAGCGGGACC TCAGCCGGGA 660 

CCGGAGCCGT GGCCGGAGCC TGGAGCGGGG CCTGGACCAA . GACCATGCGC GCACCCGAGA 72 0 

CCGCAGCCGT GGCCGGAGCC TGGAGCGGGG CCTGGACCAC GACTTTGGGC CATCCCGGGA 78 0 

CCGGGACCGT GACCGCAGCC GCGGCCGGAG CATTGAC C AG GACTACGAGC GAGCCTATCA 84 0 

CCGGGCCTAC GACCCAGACT ACGAGCGGGC CTACAGCCCG GAGTACAGGC GCGGGGCCCG 90 0 

CCACGATGCC CGCTCTCGGG GACCCCGAAG CCGCAGCCGC GAGCACCCGC ACTCACGGAG 960 

CCCCAGCCCC GAGCCTAGGG GGCGGCCGGG GCCCATCGGG GTCCTCCTGA TGAAAAGCAG 102 0 

AGCGAACGAA GAGTATGGTC TCCGGCTTGG GAGTCAGATC TTCGTAAAGG AAATGACCCG 10 8 0 

AACGGGTCTG GCAACTAAAG ATGGCAACCT TCACGAAGGA GACATAATTC TCAAGATCAA 114 0 

TGGGACTGTA ACTGAGAACA TGTCTTTAAC GGATGCTCGA AAATTGATAG AAAAGTCAAG 12 0 0 

AGGAAAACTA CAGCTAGTGG TGTTGAGAGA CAGCCAGCAG ACCCTCATCA ACATCCCGTC 12 60 

ATTAAATGAC AGTGACTCAG AAATAGAAGA TATTTCAGAA ATAGAGTCAA CCCGATCATT 132 0 

TTCTCCAGAG GAGAGACGTC ATCAGTATTC TGATTATGAT TATCATTCCT CAAGTGAGAA 13 80 

GCTGAAGGAA AGGCCAAGTT CCAGAGAGGA CACGCCGAGC AGATTGTCCA GGATGGGTGC 144 0 

GACACCCACT CCCTTTAAGT CCACAGGGGA TATTGCAGGC ACAGTTGTCC C AG AG AC C AA 15 0 0 

CAAGGAACCC AGATACCAAG AGGAACCCCC AGCTCCTCAA C CAAAAGC AG CCCCGAGAAC 15 6 0 

TTTTCTTCGT CCTAGTCCTG AAGATGAAGC AATATATGGC CCTAATACCA AAATGGTAAG 162 0 

GTTCAAGAAG GGAGACAGCG TGGGCCTCCG GTTGGCTGGT GGCAATGATG TCGGGATATT 16 8 0 

TGTTGCTGGC ATTCAAGAAG GGACCTCGGC GGAGCAGGAG GGCCTTCAAG AAGGAGAC C A 174 0 

GATTC TGAAG GTGAACACAC AGGATTTCAG AGGATTAGTG CGGGAGGATG CCGTTCTCTA 18 0 0 

CCTGTTAGAA ATCCCTAAAG GTGAAATGGT GACCATTTTA GCTCAGAGCC GAGCCGATGT 1860 

GTATAGAGAC ATCCTGGCTT GTGGCAGAGG GGATTCGTTT TTTATAAGAA GCCACTTTGA 192 0 

ATGTGAGAAG GAAACTCCAC AGAGCCTGGC CTTCACCAGA GGGGAGGTCT TCCGAGTGGT 19 8 0 

AGACACACTG TATGACGGCA AGCTGGGCAA CTGGCTGGCT GTGAGGATTG GGAACGAGTT 2 04 0 

GGAGAAAGGC TTAATCCCCA ACAAGAGCAG AGC TGAACAA ATGGCCAGTG TTCAAAATGC 210 0 

CCAGAGAGAC AACGCTGGGG ACCGGGCAGA TTTCTGGAGA ATGCGTGGCC AGAGGTCTGG 2160 

GGTGAAGAAG AACCTGAGGA AAAGTCGGGA AGACCTCACA GCTGTTGTGT CTGTCAGCAC 2220 

CAAGTTCCCA GCTTATGAGA GGGTTTTGCT GCGAGAAGCT GGTTTCAAGA GACCTGTGGT 22 8 0 

CTTATTCGGC CCCATAGCTG ATATAGCAAT GGAAAAATTG GCTAATGAGT TACCTGACTG 234 0 

GTTTCAAACT GCTAAAACGG AACCAAAAGA TGCAGGATCT GAGAAATCCA CTGGAGTGGT 24 0 0 

CCGGTTAAAT ACCGTGAGGC AAGTTATTGA ACAGGATAAG CATGCACTAC TGGATGTGAC 24 60 

TCCGAAAGCT GTGGACCTGT TGAATTACAC CCAGTGGTTC CCAATTGTGA TTTTTTTCAA 252 0 

CCCAGACTCC AGACAAG GTG TCAAAACCAT GAGACAAAGG TTAAATCCAA CGTCCAACAA 25 8 0 

AAGTTCTCGA AAGTTATTTG ATCAAGCCAA CAAGCTTAAA AAAACGTGTG CACACCTTTT 2 64 0 

TACAGCTACA AT CAAC CTAA ATTCAGCCAA TGATAGCTGG TTTGGCAGCT TAAAGGACAC 2 70 0 

TATT C AGC AT CAGCAAGGAG AAGCGGTTTG GGTCTCTGAA GGAAAGATGG AAGGGATGGA 27 60 

TGATGACCCC GAAGAC CGCA TGTCCTACTT AACCGCCATG GGCGCGGACT ATCTGAGTTG 2 82 0 

CGACAGCCGC CTCATCAGTG ACTTTGAAGA CACGGACGGT GAAGGAGGCG CCTACACTGA 2 8 80 

CAATGAGCTG GATGAGCCAG CCGAGGAGCC GCTGGTGTCG TCCATCACCC GCTCCTCGGA 2940 

GCCGGTGCAG CACGAGGAGG TGAGGCGAGG CAGGCCACGG GCAGGAACAG GAGAGCCTGG 3000 

TGTTTTCCTT GCACTCTCGT GGACAGCTGT GTGTTCAGGG TGCTGTGGAA GGCATTCCTA 3060 

AGGGTTGGAG CAGATGACTT CCAGGGAGTC TCTCGCTTTG AGTCCACGCT GGCATGGTTG 3120 

CAGTCTGTGG GGAAAGTGGG GCAGGCAGGT GGACTTCAGA AGAGCTTGGA GGGGTCAGCA 3180 

CTCCGCACAC CCATGCCCTC AGGTGCGATG GATAAACAGA ATGGCTTTAG GTGCCGTCTG 3240 

TCCAAATTAC CAGCGGAACC TTCCTTCCCA TGCAGTATTG TTGTATGTAC TTGTAACCTT 3300 

TGATTAGGTT TCTCTCTGTA CTCTTAGATG TCCTTGCTTT TCTTCCCCAT CCTGCCTTTA 3360 

ACCTTTCTAA TCTTGCCAAA GCTCTTGAGT GTTTCCCCAT CAGTTTCCTT CTCTCTTATA 3420 

TTTCAGTTTT TTAATTGAGT TCATGATCAA ACCTTCATCT GATCACATCA CATGTACTGT 34 8 0 

GCATCCACTG TGATTAGATA GCTTATGGGA TCCTTGAAAT CACATTGACA GGCACTGTAA 3 540 

AGTCACAGCC AAGTTAGCAA TTATTAGTTG CACCTCAGAG AATGTTGGAA TAATGATCTT 3 6 00 

TGAAGATGGG ATTGTTCATA TATTTGGATA ATTATTGCTG TGGATTTCTC TCTAGCATTT 3 66 0 

TAGCTCATTC CAGTAAATGA TTTTTTTCTT TATGAAATAG AACTCCCAAA AAAAAAAAAA 3 72 0 

AAAAAAAAA 3 72 9 
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CAAGCCTGGA AGAACTCGTC ATGCTCTTTG TAGCGTGGTG CTTCTGTTGC TCACAGGACA 60 

ACTTGCCTTT GATGATTTTC AAGAGAGTTG TGCTATGATG TGGCAAAAGT ATGCAGGAAG 12 0 

CAGGCGGTCA ATGCCTCTGG GAGCAAGGAT CCTTTTCCAC GGTGTGTTCT ATGCCGGGGG 18 0 

CTTTGCCATT GTGTATTACC TCATTCAAAA GTTTCATTCC AGGGCTTTAT ATTACAAGTT 24 0 

GGCAGTGGAG CAGCTGCAGA GCCATCCCGA GGCACAGGAA GCTCTGGGCC CTCCTCTCAA 3 00 

CATCCATTAT CTCAAGCTCA TCGACAGGGA AAACTTCGTG GACATTGTTG ATGCCAAGTT 3 60 

GAAGATTCCT GTCTCTGGAT CCAAATCAGA GGGCCTTCTC TACGTCCACT CATCCAGAGG 42 0 

TGGCCCGTTT CAGAGGTGGC ACCTTGACGA GGTCTTTTTA GAGCTCAAGG ATGGTCAGCA 4 80 

GATTCCTGTG TTCAAGCTCA GTGGGGAAAA CGGTGATGAA GTGAAAAAGG AGTAGAGACG 54 0 

ACCCAGAAGA CCCAGCTTGC TTCTAGTCCA TCCTTCCCTC ATCTCTACCA TATGGCCACT 600 

GGGGTGGTGG CCCATCTCAG TGACAGACAC TCCTGCAACC CAGTTTTCCA GCCACCAGTG 66 0 

GGATGATGGT ATGTGCCAGC ACATGGTAAT TTTGGTGTAA TTCTAACTTG GGCACAACAA 720 

ATGCTATTTG TCATTTTTAA ACTGAATCCG AAAGAAACTC CTATTATAAA TTTAAGATAA 78 0 

TGTAATGTAT TTGAAAGTGC TTTGTATAAA AAAGCACATG ATAAAAGGAA TCAGAATTAA 84 0 

TAAAATGTTT GTTGATCTTT AAAAAAAAAA AAAAAAAAAC TCGAGACTAG TTCTGTCTCT 90 0 

CCCTCGTGCC GAATTCGGCA CGAGGCAGAG CCTCTTCTCG TCTGTAGGAA CACCGCCAGG 960 

GAGGTCATGG CAGGGCAGGA CCAAAGGGTC CTGTGGCTCT TTTTTTTTCT CCTGTTCTGC 102 0 

ATTCCTGCCC ACACCCCCAC CCCTCCATTT CCTTCTGCTC TGGAGGCATC CTCCTTCATT 10 8 0 

GGACACCACA CAGTTTATTT CACTTGTGAC TTCAAGGTTG TGAATTCTTC CCATGGGTTA 114 0 

AGTCCTGGGA TACTTCTGCA GTGAAAGGAG GTCTTGTACC TCTTCCTCAG AGTCAGAAGT 12 0 0 

TCTGAGTACC TTTGCCCTAT TCTGAAAAGG GCTAGGGGCT CCTGCTCCCA GCTGCCCTCT 12 6 0 

TCCTTTGGCT TCCAATTCAG TTCCCTCTGC CCCGCATCCT GCAGACAGGC GCTCCCGCAG 132 0 

GGGGCCCTTG TGGACCTGCA CTGGAGTCTG TTGCCTTCAC TGAGCTGCCT GTGCTGGCCT 13 8 0 

TGCATGGTGC CTGTAGGGGG ATTTGCTTTG CTGTGCCATT GGGGTACAGC TGCTGCTCTT 1440 

ACTCTAGACC AAAAAGTCGG GTTGAGTGAC TGGTGGCAGG GCCACAGATA GAGACAGCGG 1500 

GGAGGGTGGC TGACCCTGGC GGCCCTGGAC TGAGCGTCTG GAGGAGTCGT GGAGGCTCTT 1560 

TCCCTTCTTT CTCCTCTGAG AGCTCGTTCT TCAGGCTCTT CCAGCTTGTC ATGTCGAGTG 1620 

CCTGGCCACT GCTCAGGGTT GGAGGCTCAG TCCCTTTGCC CTGTCTGTTC CAGCTCTGGA 1680 

GCTAACTCAG GGATCCCTGA TCAGGGTTAC ATAGGTTTGG TAAAATGAGT GCTGGAAATT 1740 

AACTTTCTCC CAGTAGTCTT AGGTCATGCT CAGTGAACTT AAACTTTATC CAGATATGGT 18 00 

TTTCCTTCAG CCTTTCTATT CCCTTTCTAG CCAGTGAAAG ACCCGCTGCC CTTTGACCTC 1860 

AGCCCCTCCA AGCCCCCAAG TTTAAAACGC CACCCCCTGC CGGCCCTGGA CTGAGCGTCT 1920 

GGAGGAGTCG TGGAGGCTCT TTCCCTTCTT TCTCCTCTGA GAGCTCGTTC TTCAGGCTCT 1980 

TCCAGCTTGT CATGTCGAGT GCCTGGCCAC TGCTCAGGGT TGGAGGCTCA GTCCCTTTGC 2040 

CCTGTCTGTT CCAGCTCTGG AGCTAACTCA GGGATCCCTG ATCAGGGTTA CATAGGTTTG 2100 

GTAAAATGAG TGCTGGAAAT TAACTTTCTC CCAGTAGTCT TAGGTCATGC TCAGTGAACT 2160 

TAAACTTTAT CCAGATATGG TTTTCCTTCA GCCTTTCTAT TCCCTTTCTA GCCAGTGAAA 2220 

GACCCGCTGC CCTTTGACCT CAGCCCCTCC AAGCCCCCAA GTTTAAAACG CCACCCCCTG 2280 

CCACCAGAAA AAACAGAAAA AAAAAAAAAA AAAAAACTAA AACACCCATC TGGTCTGGGC 2340 

ATCTTCCTTT CCTTTTTCAC TATGTATCCT GTTACTGGGC TTAAACAGCT TTCAGAGAAG 2400 

AGATGTCATT TCTATTAAAT GCTCTTTCAG TAGCGAACTG AGTTCACACT TGACTAAGGA 2460 

TATTTTCCGG ACTGTCTGTC ATCAGCATCC TTAGTGGGTT T C C C CAT ATT TAAATTGGTA 2520 

GAGGCCAGGG ATGGTGGCTC ACACCTGTAA TCTCAGTACT TTGGGAGGCC AAGGTAGGTG 2580 

GATTGCTTGA GCTCAGAAGA CCAGCCTGGG CAACCTGGTG AAACCGTGTC TCTACTAAAA 2640 

ATTCAAGTTA GCTAGCTGGG CATGGTGATG CACTTCTGTA GTCCCAGCTA CTTGGAGAGG 2700 

GGGTGGTGCT GGGGCAGCAG GATCGCTTGA ACCCAGGAGG TTGAGGTTGC AGTGAGC C AA 2760 

GATGGTACCA GCCTAGGTGA CAAAGTGACA CCCTGTCTCA AAAAAGAAAC CAAACAAACA 2 82 0 

TAAAAAAAAA AAAAAAAAA 2 83 9 
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AGCGGAGGCG GCGGCGGCGG CGGCGGCGGC AGAGGGAGTT TCCGCTTTGC ACTCCACCCC 60 

GGTAGCAGCT CCGCGGCAGG GACAGCTTCC TCCGGACGCT TGGCGGGCTT CGCTCTCGCC 12 0 

TTACGACAGC CCGGTCGGAT CATGGGTTTG CCCAGGGGGC CGGAGGGCCA GGGTCTCCCG 18 0 

GAGGTGGAAA CAAGAGAAGA TGAAGAACAA AATGTCAAGT TGACTGAAAT TCTGGAGCTC 24 0 

TTGGTTGCAG CTGGGCATTT CAGGGCAAGA ATTAAAGGCT TATCACCCTT TGACAAGGTA 3 00 

GTAGGAGGAA TGACTTGGTG TATCAC C ACT TGCAACTTTG ATGTAGATGT TGATTTGCTC 3 60 

TTT CAAGAAA ACTCTACGAT AGGTCAAAAA ATAGCTCTGT CAGAAAAAAT TGTCTCGGTC 42 0 

CTGCCAAGGA TGAAATGCCC ACACCAGCTG GAGGCCCACC AGATCCAGGG GATGGATTTT 4 80 

ATTCACATAT TTCCTGTTGT TCAGTGGCTG GTGAAACGAG CTATAGAAAC AAAAGAAGAG 54 0 

ATGGGTGACT ATATCCGCTC CTACTCTGTA TCCCAGTTCC AGAAGACTTA CAGTCTCCCT 60 0 

GAGGATGATG ACTTCATAAA GAGAAAAGAA AAGGCCATCA AGACAGTTGT GGACCTCTCA 660 

GAAGTGTACA AGCCCCGTCG GAAATAGAAA CGCCACCAGG GAGCAGAGGA GCTACTTGAT 72 0 

GAAGAATCTC GAATCCATGC TACACTTTTG GAATATGGCA GGAGATATGG ATTTAGCTGC 7 80 

CAGAGCAAAA TGGAGAAGGC TGAGGACAAG AAAACGG C AC TTCCAGCAGG GCTGTCAGCT 84 0 

ACAGAAAAAG CTGATGCCCA CGAGGAAGAT GAGCTTCGAG CAGCTGAAGA GCAGCGTATT 900 

CAGTCGCTGA TGACCAAGAT GACCGCTATG GCAAATGAGG AGAGCCGTCT CACCGCAAGC 960 

TCCGTGGGCC AGATTGTGGG ACTCTGCTCT GCTGAGATCA AGCAGATTGT GTCCGAGTAT 102 0 

GCAGAGAAGC AGTCTGAGCT ATCAGCTGAA GAAAGTC CAG AAAAATTAGG AACCTCCCAG 10 8 0 

CTACATCGCC GGAAAGT CAT TTCCTTGAAC AAACAGATTG CGCAAAAGAC CAAACATCTT 114 0 

GAAGAGCTGC GAGCAAGTCA CACCAGCCTA CAAGCCAGAT ATAATGAAGC CAAGAAAACG 12 0 0 

CTGACAGAGC TGAAGAC TTA CAGTGAGAAA CTGGACAAAG AGCAAGCAGC CCTCGAGAAG 12 6 0 

ATAGAATCCA AAGCTGATCC AAGTATC CTA CAGAACCTGA GAGCACTTGT AGCCATGAAT 132 0 

GAAAAT CTGA AAAGTCAAGA ACAGGAATTT AAAGCACATT GTCGAGAGGA GATGACACGA 13 8 0 

CTACAGCAAG AAATTGAAAA CCTGAAAGCT GAGAGAGCAC CACGTGGAGA TGAAAAGACC 144 0 

CTCTCCAGTG GAGAGCCGCC TGGTACCTTG ACCTCTGCAA TGACTCATGA CGAAGAC CT A 150 0 

GACAGACGGT ATAATATGGA GAAAGAGAAA CTTTACAAGA TACGTTTACT ACAGGCTCGA 15 6 0 

AGAAATCGAG AAATAGCAAT TTTGCACCGC AAGATTGATG AAGTCCCTAG CCGTGCCGAG 162 0 

CTAATACAGT ATCAGAAGAG ATTTATTGAA CTCTACCGCC AGATTTCAGC AGTGCACAAA 16 8 0 

GAAACCAAGC AGTTCTTCAC TTTATATAAT ACCCTGGATG ATAAAAAGGT TTATTTGGAA 174 0 

AAAGAGATTA GTCTGCTGAA CTCAATTCAT GAGAACTTCT C AC AGG C CAT GGCCTCCCCT 18 0 0 

GCTGCCCGGG ACCAGTTTTT ACGTCAGATG GAACAGATTG TGGAAGGAAT TAAGCAAAGT 18 60 

AGAATGAAGA TGGAAAAGAA AAAGCAAGAG AACAAAATGA G AAG AG AC C A GTTGAACGAC 192 0 

CAGTACTTGG AGCTGTTAGA AAAGCAGAGG CTATACTTTA AGACTGTGAA AGAGTTCAAG 198 0 

GAGGAGGGCC GCAAGAACGA GATGCTGCTG TCCAAGGTGA AAGCGAAGGC CTCCTGAACA 2 04 0 

TCCCCAGCCG TGGCTGTATG TCATTGATTT TACTTTTAAG CACCGTATAT CACCTACAAG 210 0 

ATCATGAAAT GGTTCTGAAA GCGACAGTAG AGAGATGCAG TTGTGATGAT TTCAACAACC 216 0 

TGGATGTTTT CTTTCTCCTC TTTGCTTCCA TTCATCTCTG TTGGCTGCTG TTGATGGAGT 2 22 0 

CAGACAGTAA ACACGTGGCT TGGATAACAC CCATCATCCT ATGAAGAATA TAGGGAGTAC 22 8 0 

TTGTTCTCTG TTGATTCAAC TTTTATGTCT CCAGTAACAT TGCGCTTATG AAGGTACCTG 234 0 

TATTTGTATG GACTCTGAAT AAAGAAGAAT TCATTTGTTT AGCAAGTATT AGTTCAGCAA 24 0 0 

C C AC TGAGAA ATAAGCACTG AGGAAGATTC AGAGACGTGT AAAACACAGT TCCTACTGCA 24 6 0 

CAAGTACCCA GCAGGTGGCC CAGGGAGGCA GATACAGCAC ACTTGACCGC AGAACTGGGC 2 52 0 

TATCCAAGAT GTTTTTCAGT AAACAGAAGG CATTTAGCTG AAATGATCAG CCCATGTAGT 25 8 0 

GTTGGTCACT TGGGCCTTTC ACCTGCCATG GTACCTTTTG TTCCCAGCTC CTCCAGGTGC 2640 

CAGCCAGCAG GCTTGGTGGT GACAGCAACT GGAACGAAAG TTCAGTGTTG TTTTAATTTT 27 0 0 

TATACGTTAC TCAAGTTGAT TTCTCAGAAA ATTGAAAACA GACCTTGTGC TGAGGACACG 2 76 0 

TCAATAAAAA TTATACCTTC CCCTACAAAA AAAAAAAAAA AA 2 8 02 
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GGAATTCCGT CGACGGCAGC GGCGGCGGCG GGTGGGAAAT GGCGGAGTAT CTGGCCTCCA 

TCTTCGGCAC CGAGAAAGAC AAAGTCAACT GTTCATTTTA TTTCAAAATT GGAGCATGTC 
GTCATGGAGA CAGGTGCTCT CGGTTGCACA ATAAAC CGAC GTTTAGCCAG ACCATTGCCC 

TCTTGAACAT TTACCGTAAC CCTCAAAACT CTTCCCAGTC TGCTGACGGT TTGCGCTGTG 
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CCGTGAGCGA TGTGGAGATG CAGGAACACT ATGATGAGTT TTTTGAGGAG GTTTTTACAG 3 00 

AAATGGAGGA GAAGTATGGG GAAGTAGAGG AGATGAACGT CTGTGACAAC CTGGGAGACC 3 60 

ACCTGGTGGG GAACGTGTAC GTCAAGTTTC GCCGTGAGGA AGATGCGGAA AAGGCTGTGA 42 0 

TTGACTTGAA TAACCGTTGG TTTAATGGAC AGCCGATCCA CGCCGAGCTG TCACCCGTGA 48 0 

CGGACTTCAG AGAAGCCTGC TGCCGTCAGT ATGAGATGGG AGAATGCACA CGAGGCGGCT 54 0 

TCTGCAACTT CATGCATTTG AAGCCCATTT CCAGAGAGCT GCGGCGGGAG CTGTATGGCC 600 

GCCGTCGCAA GAAGCATAGA TCAAGATCCC GATCCCGGGA GCGTCGTTCT CGGTCTAGAG 660 

ACCGTGGTCG TGGCGGTGGC GGTGGCGGTG GTGGAGGTGG CGGCGGACGG GAGCGTGACA 72 0 

GGAGGCGGTC GAGAGATCGT GAAAGATCTG GGCGATTCTG AGCCATGCCA TTTTTACCTT 78 0 

ATGTCTGCTA GAAAGTGTTG TAGTTGATTG ACCAAACCAG TTCATAAGGG GAATTTTTTA 84 0 

AAAAACAACA AAAAAAAAAC ATACAAAGAT GGGTTTCTGA ATAAAAATTT GTAGTGATAA 90 0 

CAGT 904 
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CGGGTGGTTG AGTGGAAGCG GTCGCCATGT CCGCGGGGAG CGCGACACAT CCTGGAGCTG 60 

GCGGGCGCCG CAGCAAATGG GACCAACCAG CTCCAGCCCC ACTTCTCTTC CTCCCGCCAG 12 0 

CGGCCCCAGG TGGGGAGGTC ACCAGCAGTG GGGGAAGTCC TGGGGGCACC ACAGCTGCTC 18 0 

CTTCAGGAGC CTTGGATGCT GCTGCTGCTG TGGCTGCCAA GATTAATGCC ATGCTCATGG 240 

CAAAAGGGAA GCTGAAACCA ACTCAGAATG CTTCTGAGAA GCTTCAGGCT CCTGGCAAAG 3 00 

GCCTAACTAG CAATAAAAGC AAGGATGACC TGGTGGTAGC TGAAGTAGAA ATTAATGATG 360 

TGCCTCTCAC ATGTAGGAAC TTGCTGACTC GAGGACAGAC TCAAGACGAG ATCAGCCGAC 42 0 

TTAGTGGGGC TGCAGTATCA ACTCGAGGGA GGTTCATGAC AACTGAGGAA AAAGC CAAAG 48 0 

TGGGACCAGG GGATCGTCCA TTATATCTTC ATGTTCAGGG CCAGACACGG GAATTAGTGG 54 0 

AC AGAGC TGT AAACCGGATC AAAGAAATTA TCACCAATGG AGTGGTAAAA GCTGCCACAG 600 

GAACAAGTCC AACTTTTAAT GGTGCAACAG TAACTGTCTA TCACCAGCCA GCACCCATCG 660 

CTCAGTTGTC TCCAGCTGTT AGCCAGAAGC CTCCCTTCCA GTCAGGGATG CATTATGTTC 72 0 

AAGATAAATT ATTTGTGGGT CTAGAACATG CTGTACCCAC TTTTAATGTC AAGGAGAAGG 78 0 

TGGAAGGTCC AGGCTGCTCC TATTTGCAGC ACATTCAGAT TGAAACAGGT GCCAAAGTCT 84 0 

TCCTGCGGGG CAAAGGTTCA GGCTGCATTG AGCCAGCATC TGGCCGAGAA GCTTTTGAAC 90 0 

CTATGTATAT TT AC AT CAGT CACCCCAAAC CAGAAGGCCT GGCTGCTGCC AAGAAGC TTT 960 

GTGAGAATCT TTTGCAAACA GTTCATGCTG AATACTCTAG ATTTGTGAAT CAGATTAATA 102 0 

CTGCTGTACC TTTACCAGGC TATACACAAC CCTCTGCTAT AAGTAGTGTC CCTCCTCAAC 10 8 0 

CACCATATTA TC CAT C CAAT GGC TAT CAGT CTGGTTACCC TGTTGTTCCC CCTCCTCAGC 1140 

AGCCAGTTCA ACCTCCCTAC GGAGTAC CAA GCATAGTGCC ACCAGCTGTT T C ATT AG C AC 1200 

CTGGAGTCTT GCCGGCATTA CCTACTGGAG TCCCACCTGT GCCAACACAA TACCCGATAA 1260 

CACAAGTGCA GCCTCCAGCT AGCACTGGAC AGAGTCCGAT GGGTGGTCCT TTTATTCCTG 1320 

CTGCTCCTGT CAAAACTGCC TTGCCTGCTG GCCCCCAGCC CCAGCCCCAG CCCCAGCCCC 13 80 

CACTCCCAAG TCAGCCCCAG GCACAGAAGA GACGATTCAC AGAGGAGCTA CCAGATGAAC 1440 

GGGAATCTGG ACTGCTTGGA TACCAGCATG GACCCATTCA TATGACTAAT TTAGGTACAG 15 00 

GCTTCTCCAG TCAGAATGAG ATTGAAGGTG CAGGATCGAA GCCAGCAAGT TCCTCAGGCA 15 60 

AAGAGAGAGA GAGGGACAGG CAGTTGATGC CTCCACCAGC CTTTCCAGTG ACTGGAATAA 162 0 

AAACAGAGTC CGATGAAAGG AATGGGTCTG GGACCTTAAC AGGGAGC CAT GGTGAGTGTG 168 0 

ATATAGCTGG GGGAACAGGG GAGTGGCTAA GACTGGTCTA AAGCTATTAG TTTTCTCAGC 174 0 

CGGGCGCAGT GGCTCACGCC TGTAATCCCA GCACTTTGGG AGGC CGAGGT GGGCAGATCA 18 0 0 

CCTAAGGTCA GGAGTTCAAG ACCAGCTTGG CCAACATAGT GAAATCCCAT CTCTACTAAA 18 60 

AATACAAAAA CTAGCGGGCA TGGTGGTGGG CGCCTGTAAT TCCAGCTACT CAGGGGGTTG 192 0 

AGGCAGGAGA ATCGCTTCAA CCTGGGAGGC AGAGGTTGCA GTGAGCCAAG ATCAGACCAC 198 0 

TGCCCTCCAG CCTGGGCAAT AGAGCAAGAC TCCATCTCAT AAATAAATAA ATACATAAAT 2 04 0 

AAAGC TATTA ATTTTCTAAC CTGATGTTCA TTCAGGTGTT TAATCCAACC TCTATAATCT 210 0 

GTTGGCCAGT GAAAATACTT TTGGGCTGGG CACGGTGGCT CACGCCTGTA ATCCCAGCAC 2160 

TTTGGGAGGC CAAGGTGGGC GGATAACCTG AGGTCAGGAG TTTGAGACCA GCGTGGCTAA 222 0 

CACGGTGAAA CCCCGTCTCT ACTAAAAATA GAAAAATTAA GCTGGGCATG GTGGTGCATG 22 80 
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CCTGTAATTC CAGCGGCTTG GAAGGCTGAG GCAGGAGAAT CACTTGAACT TGGGAGGTGG 23 4 0 

AGGTTGCAGT GGGCCGAGAT CACACCACTG CATTCCAGCC TGGGCACTAG AGTG AG ACT C 240 0 

TGTCT CAAAA AAAAAGAAAG AGAAAGAGAA AATAGTTTCT AAAAAATTGT ATACAGACAA 24 60 

CCTTTTATTT CCAACAAACG TGTGCCGAGA GAGAGAGAGA GAAAATAGTT TTAAAAAAAT 2 52 0 

TGTATACAGA CAACCTTTTG TTTCCAACCA ACGTGTATCT AGAAAAGAGT TAGTCGACTT 25 8 0 

ATTTTATACA TAGCATCAGT GAATAGTAAT GAGTGGTAGG TCATTTCAAA ATCCTGTTGC 2 64 0 

CTATATTATG TGAATACCAG GAGGTGATCT GATACGGACT TAATAAAGGT TGATTTTGCT 2 70 0 

TTATATTGGG AGCTGAGCCA CACCTCCCCT TATAACTCTA TTGGTCAGTA ATGGTCAGTT 2 760 

TGTGGCTGTT AGGAAAATGT TGCCTTTTAG CATTCCAGAA CTCTAAATCC TGTAGAGGTA 2 82 0 

CATGGGATAT TTTATTCTTT GCCTGTACTC ATAAAAATGA ACAGAAGAAA ATACGTTTTT 28 80 

TTCTTTTCTT AACTTCTTTT CTTTTAACTC TTTAAAAGGT GAAATATCAG CCCTCAAGAG 2 94 0 

ACT G ACTTGC TAACTTTCCT TTTTTTCTTT TTTTTTCTTT TTTTTGTGTT TCTTTTTTCT 300 0 

TTCTCTGTTT TCTTACATGG TTCTGGTGGA TTCACATTTG CTGATGCTGG TGCTGTTTTT 3 0 60 

CGTGTGATCT TCAACGTTTT TGGGTGAC C A TTGACCCTGT GACCTCAAAA TGGTGTC CAA 312 0 

CTAACCACTT AAAATTAACA TCTTTTTTTT AATTAACGAA TTTATGGTAT TTTTTTTTTT 318 0 

CCCTTGGCGG GGATGGGGTT GGGGTTGTTT TTTCTCTATT CTAGATTATC CAGCCAAGAA 324 0 

GATGAAAACT ACAGAGAAGG GATTTGGCTT GGTGGCTTAT GCTGCAGATT CAT CTGATG A 330 0 

AGAGGAGGAA CATGGAGGTC ATAAAAATGC AAGTAGTTTT CCACAGGGCT GGAGTTTGGG 33 60 

ATACCAATAT CCTTCATCAC AACCACGAGC TAAACAACAG ATGCCATTCT GGATGGCTCC 342 0 

CTAGGAAACA GTGGAACAGA GTTTTGACCC TCAGTGACTC TTCTTAGCAA TAATGCATGC 34 8 0 

ATTTGATTTA ACAAGAC TCT GGGGCCTGTG CTGGGAACCA TCTGGACCTT TGCAGAAGTT 3 54 0 

AGAGATTCAG TGCCCCCCTT TCTTAAAGGG GTTCCTTAAC AACCACAAAA ATCCTTATTT 3 60 0 

CTGCAGTGGC ATAGAAT CTG TTAAAATTTA ATTAGAATCA CAAATTTATC TCAGAAGCTT 3 660 

TTTAACAGTT GGTGAAATGT GCTTGTCCAA CAAAGCATCC TAACAGGGTC GTTCCCATAC 3 72 0 

ACATTTGACC TGGTCAGCCT TTTCCAGGTG AATAGC C CCA GTTCTGACAT AAAGAAAGTT 3 78 0 

TTATTTGTAT TTTACTACTG TTTGGTCAAT TTTGATATAT AACTGGTTAC AAACAGAGCC 3 84 0 

TTACTATTTA TTAGTGGGGA AATGATTTTA AGACCGTCCT TTTCAGTATT TAATTCTGAC 3 90 0 

AGATCTGCAT CCCTGTTTTG TTTTGGATTA TTTCTGTTTT GGAAAATGCT GTCTCATTTA 3 96 0 

AAACTGTTGG ATATAGCTGG ATCCTGGATA GGAAAATGAA ATTATTTTTT CATTGTGTTT 4 02 0 

TTTAATTGGG GTGATCCAAA GCTGGCACCT TCAGGCACAT TGGTCTCATA GCCATTACTG 4 08 0 

TTTTTATTGC CCTTCTAAGA TCCTGTCTTC AGCTGGGTCA GAGAAAACTT CTTGACTAAA 414 0 

ACTGGTCAGA ACTCATCACA GAAATGAAAT ACAGTGGTCT CTCTCTCCCA GAACTGGTTG 42 0 0 

CAGCTAAAAC AGAGAGATCT GACTGCTGGC TATAGGATTT TGGACTTAAT GACTGAAATT 42 60 

GCAAATTGTC CTTTTTCTTG GCATTACAGA TTTTGCCAAA ATAACTTTTT GTAT CAAAT A 4 32 0 

TTGATGTGTG AAAGTGAAGG AGCTAGTCTG CTGAACCAGG AATAGTTTGA GATATTGAAC 43 8 0 

TGTCATTTTT GCACATTTGA ATACTTTGCA GGCTGGCTTT GTATAAACTT ATCCTCTGGT 444 0 

TTCCTATATG TTGTAAATAT TT AGAC CAT A ATTTCATTAT AAATAAATCT ATAAATATTC 4 50 0 



Seq ID NO: 16 
Primekey #: 421221 
Coding sequence : 

1 11 21 31 41 51 

I I I l l I 

TCGACTGCCA AAGCAATGAA GCTTGCGGCC GCGGCCACAG TCATGGCCTT TCCCCCTGGT 6 0 

GCTCTTCATC CTTTACCAAA GAGACAAGCA CTTGAAAAAA GCAATGGTAC CAGCGCGGTC 12 0 

TTTAACCCCA GCGTCTTGCA CTACCAGCAG GCTCTCACCA GCGCACAGTT GCAGCAACAC 18 0 

GCCGCGTTCA TTCCAACAGG TATGTGCCCT TACTGCCCTA CGTCCTGTGC CCTTCTGGTC 24 0 

ATGTG CTTTC TTCTCATTTC TCTAAGCTGT TTGGTGGCAT CTAGTTTGCT TTTGAAGGTA 3 00 

TAATACAGTT TGAAATTCAT CGTTGTCCTA GCTATCTAAA TGTATTTACC TTACTTTGAA 3 60 

TGATAGCTAA AGAC TGTTAG GATTCTAAAG CCAAATATTT GATAGATTGA AGAGACAGAT 42 0 

TTAACCCATG AGAAACAGCA GTTAGGGCTT TTGGTTTCTT GTATTTGCAC AAGCCCTGTA 48 0 

AAATTGTTTA TGTAAATAAG ACCTTTTATG TGTGACAATT GAAATTTGTC CTTAACTCTG 54 0 

AATGACCTAA AAATAGCAAT TCCAGTAAAT ACTAACCATT TTTTTCTATT TCTATTCAGA 6 00 

GCACTAAAAC AATGAGGCTA TTCAAATTAA AGCAATTCTC TACTCATATT TTTATATTCA 66 0 

TTCTATCTCT TTCTCCATCC TTCTCAACTT TCACCAAGTT CACAAGTATA TAGAGCTCTT 72 0 

ATCCTCAGTG TCTAAGCCAA TGCCTGATAC TATTACGTAC GATGTGCATT AACTATGATT 78 0 
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CCACTAAAAG ATCCATTGTA ATAGTCATAG AATCTTAGAG TTTAAAGGAC TCTTAGTGAT 84 0 

CTCCTCATCC AGCTGATTGT TTTACAGATG AGAAAACTGA GGCCCCCTAA ATGAGAAGTG 90 0 

ACTTTCCAAG GTGCCACAAC TAATGAGAAA AAGAAC TGAG TTTCCCTGTG ACCAAACCCA 960 

TTTAGATCAC ATTCTACCAC CTGGGCCCGC CTATATATAC ACATTCGACA GAGTTCTCCT 102 0 

GAAAAAAAAA AAAAGCAGAT AAAAGTGAAT TTTTAAATAA CTGACCCCAA AAAGTCAGAT 10 8 0 

AAAAGTAAAA AAACAAAAGT ATAAATCATG TCATCCCTCC CCCATTTGCA CCGACATCTC 114 0 

T AAC C AC AG A CACACACACG CACACCATAC GCAAAGATAG TCACCATAAT TGACCATGTT 12 0 0 

TTTCACCTTT TAGTCAATGT TAGAAGCAAG GGGTAACTTA AGTCCTGGTG GGAAGACCAT 12 6 0 

CCATTGAGTT CTTTGAAAGT CAACATTTTT CAGCCCACGA TAGTGAAATG AAAGTAAATA 132 0 

TAAATGAATA ACAATT CTAA CAAAAAGAGT TTTTTGATTC AAATCCATTA GTTTGAACTT 13 8 0 

TTCGAGCTTA TTATCCATTT CCTTAAATCC CATAGCTTAT CAGAGTTAAC ATCAGAGGGA 1440 

GGTAAAATAT TTCTGTGATA TTCTTTGTAT AAAATCTACA CTTTGAAATG GATTAGTAAC 15 0 0 

CTGTGAACAA TACATATTTT AGTTAACATA TAAATTATGT GAGCAAAGTG GTTTTCAGTG 15 6 0 

TTTTTTTCTT ATTTTAGTTT TGAACCTGTC TTAAACTCAC AGACTTGTAG AAGAAATCTC 162 0 

TAATTCAGTA TTTATTAGGA GTTCACTTTT GCCCTATTAC AGCCTTAATT AGTGACATCC 168 0 

CAGTGCTGTT ACAGCATAGC AGTGTCTTAA TATGTAATCT AATTGAAATA ACACATTTGT 174 0 

AAAATAATTA C TAGAAGGT A AACTTACGTT AATGTCCTGT GTGGTTTCTA CAAAGTGTGT 18 0 0 

CATTGTAGAC CTCTTGGCCA CTAGATATTT TAAGATAAAA AAAAAAAAAA ATCGACGCGG 18 6 0 

CCGCGAATTT AGTAGTAGTA GTAGGC 18 8 6 



Seq ID NO: 17 
Primekey #: 42 9766 
Coding sequence : 

1 11 21 31 41 51 

I 1 I I I I 

CGGCACGAGG GCTGCTAAGA AGGCAGACAG CACCAAGCGC TAAATGAGAT GGGGCACCTG 60 

GTGGTCTTCT GTGCTACTGG TAGGGGTGCA GCAGAGTGGT CAGTCTGGAC AGTAGCTGAC 12 0 

ATCACGTGAC CCAACACACG CATTCCTGGC TACTTACCAA GGAGAATAGA AAGCAGGCAG 18 0 

ATCTCTACAG CAGCTCTCTA CCTGATTGCA AAACAATGGA AATGCCCACA TGTCCACAAA 24 0 

CAAGTGTGTG GTCTGCCTGT GCCATGAAGC ACAGTGTGGC TGAGCGTCAA GAGTCCCCAC 3 00 

ACTCAAAGGA GGCAGCAGAT ACAGGGCTGC ACACTGTGTG ATT CC AC AC A TGTGACATTC 3 60 

TGGACACGGA CATGCTGGAT GGCAAAACGA GCATCGGGCT GAGAGGACTG C TGAG AAGGG 42 0 

GAACGGGGCT GCTGGGATGT GGGTTGATTG TAGCAGTAGC TCATGGAGAT GTGACCTCAA 48 0 

AAGAGTGATT TTTACTATGT GCATACTATA CCTCCACAAA CTTGACTTTA AAAAAATAAA 54 0 

ATATTCACAG AAAAAAACAA AAACAAATGT AAAACCATCA GACTACTTTA TCAGAGGTGT 60 0 

TATTTTTAGA TAGAGGTCTT TGAACT C CAT CCTAGGAACA TTGT AC C CAT GTCCTCCCAG 660 

AACTGCATCT TGCACTGGGT GTCGGAAGAC AGCCCTGCAA GACCTGTATG CTCTGTACCA 720 

TTCAGTGGTT TTTAAGGTTA ACTACCAGAA GTCATATCTG AGGCCTCCCA GAAGCATTAC 780 

TCTAAGGAAA GTAGTTAAAT GTGGACAGTG AC AG C AG AAA CATTTACACA TTAAACCAGT 840 

TTATAGAACA TGANNNNNNN NNNimtSTNNAA AGAAGCTTGT CAGCTCAATG ACTTACGAGG 90 0 

CGTGGGCCAT TAAAAAAAAA GGTCTGGAGT TTGGGAAGGA GAAAGGAATG GGGATGTGCA 960 

GCT CAAGAGT GTGATTTTTA CTATGTGCAT AGTATACAGT GTGGAGACTT GAC TTTAGGA 1020 

AAGTAAAATA TTCACAGAAA AA 1042 



Seq ID NO: 18 
Primekey #: 450 62 8 
Coding sequence : 

1 11 21 31 41 51 

1 I I I I I 

CAACTTCACG GACGCATTCA AGACCATGCT ATCATGGGAA ATCTGGTTAT GTTGTAATTT 60 

TTAATATAAT TAAGGTAAAG CTTAAATGTG CTGTTACGTG ATTTCCTTTT AAAGTTTAAG 12 0 

GTTATCTACC TTTGATATTC TCTGTAGATA TTAGTTGAAC ATAGTTCTCA C C AAAGT TAG 18 0 

CTATCCAAAT TCAGGAAAAG CAAAACTATT TTTCCTTTTC TTTAAAAAGA AAACTTTGAT 24 0 
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TCATTTACTA GATTGTAAAC TTTTTTTTAA CTTCAAAAAT AATAAAAGGG TATGCAGGGA 3 00 

AAAATCTTCC TCTCACCTGT CAGAGCTACT TTTTAAATAT GAAATAAGAG AAAACAAGTA 3 60 

GCTGCTTATA AGGTGATGTG ATTACACTTA TAAAAGATGA ATTTAGAAAA CAACATTCAT 42 0 

TGTCTAATTT AAATGGTCAA TAGAATCTTT ATTTTCTTTC TCCATAAGAC ATCCAGCTTC 4 80 

ACAGCTTCAT GTGCTACCTA GAACTGATGA TGCCACAAAT CCTTAAATGT CCTAAATGGT 54 0 

ACTGTTAAGT GAATCGTGCA ATTAGAATTT TCACCCAAAC AGAAGGGAAA CTGATTTTAG 60 0 

ATGTGATTGG GCTTCTTGAG GACATTTCTG TGGTCTCGTT TTATTGTTTT TTTTTTTAGC 660 

TTTGTTACTA TCTTAAATTC TTTGGTTATC AGCCTAGCAC TAAATGACCT TTAATTAAAA 72 0 

AAAAAAAAAA AATCGTGCCG 74 0 



Seq ID NO: 19 
Primekey #: 450177 
Coding sequence : 

1 11 21 31 41 51 

I I I I I I 

AATAGAATGA ATCCAATTTC TTGCCTTGGG TTACTGACTC TTTCAATTGT AACTAAGTAC 6 0 

AATAGCAGTT AAGCTCAAGC TGTAATAGTA GAGCTCAGTG GAAGCTAAAC CAGGCACAGT 12 0 

AACTGACACC ATGTAGGTTG ATTATATTTT GCATCTCCCT GCAAGTCTGT TTTATGTTAT 18 0 

TTATAGCTTC CTATTCGTGT AGACACCAGC AGTAAACTGG GGAATATTTG TGGCAGGAAT 2 40 

TTCTAAGAAC AACCTTTAGC ATCATCTCAG GCCCTGATCC ATTTCCTTTT CCACAAAATT 3 00 

GTTTGAGATT ATATCGTATG TGTTACAGAA AGAATGTTTT TCTGTATGCT CGAAACTGTA 3 60 

T AC TAAAG T A AAATAATAAA GTTAACCAGA ATTATCCATG GGGAACAATT CCAATTAAAA 42 0 

TAAAATGCCA GTATCTGGTA AAACCTGGTA GTAATGCTTT TTGTGGTGAT ATCCAGGTAA 4 80 

TGATTAGATG CAGTAAACCC GGGTAGTAGG GAAGAAGAGA GATGTGGGGA CAAGCAGCCC 54 0 

GAATACCTTG CTGGCATAGC AGCTGCCTAC CTGCACCCGG AGAC CTGAGC AGATATTACT 6 00 

AGGGTATTAT TTGACAGCCA GCTTAGCAGT CAAGAAGGAC ATTGATTTGG GGTAGCATGG 6 60 

CAGACCACTT CATTGGGGCT GAAGAC CTGC ATTTATTGAT CACTTACTAC ATGCCACGTA 72 0 

TTTCGTTTAG GATATATATG TGTGCATGTG TATAATTTTA AAATATACCC CACGGTAGAG 78 0 

GCAGAGCTGT TGGCAGTGAG CCGAGATCGC GCCACTGCAT TCCAGCCTGA GCGACAGAGC 84 0 

GAGACTCTGT CTCAAAAAA 859 



Seq ID NO: 2 0 
Primekey # : 407618 
Coding sequence: 

1 11 21 31 41 51 

! I I I I I 

TGCGCTACTT TTTTTGAGCC TGGGCGACAG ATTGAGACTC CGTCTCAAAA AAAAGAAAAA 60 

AAAAAGAATG CTTTCATCAG CAAAACATTG TAACATTCCC TTTACTTGAG GGCGTCCACA 12 0 

ATAC CGTAAG GTTGCGTGAA CTGTCCTACT GAATCTTCAT GGTTGCTTGG ATTTTAATCA 18 0 

CATCAGAAGA ATTTGAGAGC ATAC C ATGGC TGGCAGTCCA TAAAAGACTA GTTAGGAACA 24 0 

TCAGCTTTTA ATCATCGACC CTGCTTTCAG GTTTCATTTT AAACTTATAG AAGAGGGGAA 3 00 

GACATCAGTG TGCTTATTTG GCCTTTACTC TAAATCTTAA AAGGAAGAAA ATTTTAATAT 3 60 

TTCTTAGTTT GAGCCCAGGT GCGGTGTCTC ACGCCTGTAA TCACAGCACT TTGGGAGGCC 42 0 

AAGGCAGGCG GATCACTTGA GGTCAGGAGT TCAAGACCAG CCTGCAACGT GGTGAAACCC 48 0 

TGTCTGTACT AAAAATTAAA AAAAAAAAAA AAAAAATTAG CCGGGCGTGG TGGCAGTCGC 54 0 

CTGTAGTCCC AGCAACTCCA GAGGCTGAGA CAGGAGAATC GCTTGAACCC CAGAGGTGGA 60 0 

GGTTGCAGTG AGCTGAGATG GTGCCACTGC ACTCCAGCCG TGGGCGACAG AGCCAGACTG 660 

CATCTTGTGG GTGTAAAAAA AAAAATTTGT AGTTTGAGAG TCAACTTTTT CCTCACAGCT 72 0 

TTCTGAAAAT GTGGCCCTTT GGATGCTGAT AAAAGCTGGT GGTGATTTTA ACACCTTAGT 7 80 

AGCCAGAATC GAGACTGTCA TGGGGCACTT TTAAAATCTC ACCACGATTT GACTCCCATT 84 0 

CACAAGGTAG CCATTGGGGC TCAGTCTCCC TGAATGCTCC TGCAAAAGTG CAGTCTGCCA 90 0 

AGGTTTTCTC T AG AATAAT C TCGGTGTGTG TTCACTGTAA CAGTTCTGAG TTAC AC C C AG 960 

AGTTCATTCG GTTAACATTG TTCCTACCAG GCAAGACTTC TGGTGTTAGA AG 1012 
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Seq ID NO: 21 
Primekey #: 43 5 93 7 
Coding sequence : 

1 11 21 31 41 51 

I I I I I I 

CATGATTACG GATTTTAATC CGCCTCATTA TAGGGAATTT GGCCCTCGAG GCCAAGAATT 60 

CGGCCCCCAG GCACAGAAGA GACGATTCAC AGAGGAGCTA CCAGATGAAC GGGAATTTGG 12 0 

ACTGCTTGGA TACCAGGTTA AATAAAATAC CCTGTTTTCC TATCTTCACC TTATTCTTCT 18 0 

ACTATATTCT CCCTTTAAAA AAGATAAATT CACATCATTC TCCCAGTACT AGGATTTCTG 240 

CTTTCTGGAA TTCATTTTGG TTAGGTTTTT TATCCTATTC AACAGACTCT TGAAAGCCTC 3 00 

TGAGAGTTCT TACTTTCTTA TACATCTCAC TCAAAGCTCT TGATCTACCA GTATGTGGTT 3 60 

TGTATTTAAA ACCTTGGCTT TCAGTGGTGC TCTCTCTTTT ACCCTCCACC TAAAAAAGAG 42 0 

AGTGATATCT CCCTCCAGTC TCCCCACCCC TCAAGACTGC TAGAAAAGGA GTGATTCTGT 4 80 

ACATGTAATT GTAAAGTTAG CCACTAAAGT TAAAAAGATT CTTAATTTGT AGTTTTGGTG 540 

CAATTTTATC AGAAGT AC C T TTCCATTTTG CCAGAATCCT TGAATCATTC TTTAAACCAA 60 0 

AGCATTTTTT TATAGTTTCT AGCTAGGTTT AT AG AAAC T A GTGGAGCTAT GGGCAGTCAG 660 

T T AAAAAC AG GCCATAGATA GCATAATGAA TTATAAC AC C CCTGTCCAAG TCCTATAGAG 72 0 

AAAAAAAAAA AAAAA 735 



PROTEIN SEQUENCES 

Seq ID NO: 22 
Primekey #: 446619 

1 11 21 31 41 51 

I I I I I I 

MRIAVICFCL LGITCAIPVK QADSGSSEEK QLYNKYPDAV ATWLNPDPSQ KQNLLAPQTL 6 0 

PSKSNESHDH MDDMDDEDDD DHVDSQDSID SNDSDDVDDT DDSHQSDESH HSDESDELVT 12 0 

DFPTDLPATE VFTPWPTVD TYDGRGDSW YGLRSKSKKF RRPDIQYPDA TDEDITSHME 180 

SEELNGAYKA IPVAQDLNAP SDWDSRGKDS YETSQLDDQS AETHSHKQSR LYKRKANDES 24 0 

NEHSDVIDSQ ELSKVSREFH SHEFHSHEDM LWDPKSKEE DKHLKFRISH ELDSASSEVN 3 00 



Seq ID NO: 23 
Primekey #: 408199 



11 



21 



31 



41 



51 



MQQRGAAGSR GCALFPLLGV LFFQGVYIVF SLEIRADAHV RGYVGEKIKL KCTFKSTSDV 60 

TDKLTIDWTY RPPSSSHTVS IFHYQSFQYP TTAGTFRDRI SWVGNVYKGD ASISISNPTI 120 

KDNGTFSCAV KNPPDVHHNI PMTELTVTER GFGTMLSSVA LLSILVFVPS AVWALLLVR 18 0 

MGRKAAGLKK RSRSGYKKSS IEVSDDTDQE EEEACMARLC VRCAECLDSD YEETY 235 



Seq ID NO: 24 
Primekey #: 421221 

1 11 21 31 41 51 

I I I I I I 

MALNVAPVRD TKWLTLEVCR QFQRGTCSRS DEECKFAHPP KSCQVENGRV IACFDSLKGR 6 0 

CSRENCKYLH PPTHLKTQLE INGRNNLIQQ KTAAAMLAQQ MQFMFPGTPL HPVPTFPVGP 12 0 

AIGTNTAISF APYLAPVTPG VGLVPTEILP TTPVIVPGSP PVTVPGSTAT QKLLRTDKLE 18 0 
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VCREFQRGNC ARGETDCRFA HPADSTMIDT SDNTVTVCMD YIKGRCMREK CKYFHPPAHL 24 0 

QAKIKAAQHQ ANQAAVAAQA AAAAATVMAF PPGALHPLPK RQALEKSNGT SAVFNPSVLH 3 00 

YQQALTSAQL QQHAAFIPTG SVLCMTPATS IVPMMHSATS ATVSAATTPA T S VP FAATAT 3 60 

ANQIILK 367 



Seq ID NO: 25 
Primekey #: 449491 

10 

1 11 21 31 41 51 

I I I I I I 

MASSPAVDVS CRRREKRRQL DARRSKCRIR LGGHMEQWCL LKERLGFSLH SQLAKFLLDR 60 

YTSSGCVLCA GPEPLPPKGL QYLVLLSHAH SRECSLVPGL RGPGGQDGGL VWECSAGHTF 12 0 

15 SWGPSLSPTP SEAPKPASLP HTTRRSWCSE ATSGQELADL ESEHDERTQE ARLPRRVGPP 18 0 

PETFPPPGEE EGEEEEDNDE DEEEMLSDAS LWTYSSSPDD SEPDAPRLLP SPVTCTPKEG 24 0 

ETPPAPAALS SPLAVPALSA SSLSSRAPPP AEVRVQPQLS RTPQAAQQTE ALASTGSQAQ 3 00 

SAPTPAWDED TAQIGPKRIR KAAKRELMPC DFPGCGRIFS NRQYLNHHKK YQHIHQKSFS 3 60 

CPEPACGKSF NFKKHLKEHM KLHSDTRDYI CEFCARS FRT SSNLVIHRRI HTGEKPLQCE 420 

20 ICGFTCRQKA SLNWHQRKHA ETVAALRFPC EFCGKRFEKP DSVAAHRSKS H PALL LAP QE 48 0 
SPSGPLEPCP SISAPGPLGS SEGSRPSASP QAPTLLPQQ 519 



25 Seq ID NO: 2 6 

Primekey #: 429766 

1 11 21 

,n 1 1 1 

30 MAHGS QEAEA PGAVAGAAEV PREPPILPRI 
RVPVPKHWK GKQVSVALSS SSI RVAMLE E 
LVNLSKVGEY WWNAILEGEE PIDIDKINKE 
SHELKVHEML KKGWDAEGSP FRGQRFDPAM 

35 

Seq ID NO: 2 7 
Primekey #: 448518 

40 1 11 21 31 41 51 

I I i I I I 

MLGAETEEKL FDAPLSISKR EQLEQQVGGV GQRWRQVQWP RALPELLSSQ GCWAPYSTHG 60 
RCTQGLVGCP CRSLSPLTCP CLILQVPENY FYVPDLGQVP EIDVPSYLPD LPGIANDLMY 12 0 
IADLGPGIAP SAPGTIPELP TFHTEVAEPL KTYKMGY 157 



31 41 51 

I I 1 

QEQFQKNPDS YNGAVRENYT WSQDYTDLEV 6 0 

NGERVLMEGK LTHKINTESS LWSLEPGKCV 12 0 

RSMATVDEEE QAVLDRLTFD YHQKLQGKPQ 18 0 
FNISPGAVQF 22 0 



Seq ID NO: 28 
Primekey #: 421999 

50 

1 11 21 31 41 51 

I I I I I I 

MQQRGAAGSR GCALFPLLGV LFFQGVYIVF SLEIRADAHV RGYVGEKIKL KCTFKSTSDV 6 0 

TDKLTIDWTY RPPSSSHTVS IFHYQSFQYP TTAGT FRDR I SWVGNVYKGD ASISISNPTI 12 0 

55 KDNGTFSCAV KNPPDVHHNI PMTELTVTER GFGTMLSSVA LLSILVFVPS AVWALLLVR 18 0 

MGRKAAGLKK RSRSGYKKSS IEVSDDTDQE EEEACMARL 219 



60 Seq ID NO: 2 9 
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Primekey #: 450628 

1 11 21 31 41 51 

MRGNLALVGV LISLAFLSLL PSGHPQPAGD DACSVQILVP GLKGD AGE KG DKGAPGRPGR 60 

VGPTGEKGDM GDKGQKGSVG RHGKIGPIGS KGEKGDSGDI GPPGPNGEPG LPCECSQLRK 12 0 

AIGEMDNQVS QLTSELKFIK NAVAGVRETE SKIYLLVKEE KRYADAQLSC QGRGGTLSMP 18 0 

KDEAANGLMA AYLAQAGLAR VFIGINDLEK EGAFVYSDHS PMRTFNKWRS GEPNNAYDEE 24 0 

DCVEMVASGG WNDVACHTTM YFMCEFDKEN M 2 71 



Seq ID NO: 3 0 
Primekey #: 45 062 8 

1 11 21 31 41 51 

I 1 I ! I I 

MASLLKNGEP EAELHKETTG PGTAGPQSNT TSSLKGERKA IHTLQDVSTC ETKELLNVGV 6 0 

SSLCAGPYQN TADTKENLSK EPLASFVSES FDTSVCGIAT EHVEIENSGE GLRAEAGSET 12 0 

LGRDGEVGVN SDMHYELSGD SDLDLLGDCR NPRLDLEDSY TLRGS YTRKK DVPTDGYESS 18 0 

LNFHNNNQED WGCSSRVPGM ETSLPPGHWT AAVKKEEKCV PPYVQIRDLH GILRTYANFS 24 0 

I TKELKDTMR TSHGLRRHPS FSANCGLPSS WTS TWQVADD LTQNTLDLEY LRFAHKLKQT 3 00 

IKNGDSQHSA SSANVFPKES PTQISIGAFP STKISEAPFL HPAPRSRSPL LVTAVESDPR 3 60 

PQGQPRRGYT ASSLDISSSW RERCSHNRDL RNSQRNHTVS FHLNKLKYNS TVKESRNDIS 42 0 

LILNEYAEFN KVMKNSMQFI FQDKELNDVS GEATAQEMYL PFPGRSASYE DIIIDVCTNL 480 

HVKLRSWKE ACKSTFLFYL VETEDKSFFV RTKNLLRKGG HTEIEPQHFC QAFHRENDTL 54 0 

IIIIRNEDIS SHLHQIPSLL KLKHFPSVIF AGVDS PGDVL DHTYQELFRA GGFVISDDKI 600 

LEAVTLVQLK EIIKILEKLN GNGRWKWLLH YRENKKLKED ERVDSTAHKK MIMLKSFQSA 660 

NIIELLHYHQ CDSRSSTKAE ILKCLLNLQI QHIDARFAVL LTDKPTI PRE VFENSGILVT 720 

DVNNFIENIE KIAAPFRSSY W 741 



Seq ID NO: 31 
Primekey #: 4 08 80 6 

1 11 21 31 41 51 

I I I I I I 

MPVRGDRGFP PRRELSGWLR APGMEELIWE QYTVTLQKDS KRGFGIAVSG GRDNPHFENG 60 

ETSIVISDVL PGGPADGLLQ ENDRWMVNG TPMEDVLHSF AVQQLRKSGK VAAIWKRPR 12 0 

KVQVAALQAS PPLDQDDRAF EVMDEFDGRS FRSGYSERSR LNSHGGRSRS WED S PERGRP 180 

HERARSRERD LSRDRSRGRS LERGLDQDHA RTRDRSRGRS LERGLDHDFG PSRDRDRDRS 240 

RGRSIDQDYE RAYHRAYDPD YERAYSPEYR RGARHDARSR GPRSRSREHP HSRSPSPEPR 300 

GRPGPIGVLL MKSRANEEYG LRLGSQIFVK EMTRTGLATK DGNLHEGDI I LKINGTVTEN 360 

MSLTDARKLI EKSRGKLQLV VLRDSQQTLI NIPSLNDSDS EIEDISEIES TRSFSPEERR 420 

HQYSDYDYHS SSEKLKERPS SREDTPSRLS RMGATPTPFK STGDIAGTW PETNKEPRYQ 480 

EEPPAPQPKA APRTFLRPSP EDEAIYGPNT KMVRFKKGDS VGLRLAGGND VGIFVAGIQE 54 0 

GTSAEQEGLQ EGDQILKVNT QDFRGLVRED AVLYLLEIPK GEMVTILAQS RADVYRDILA 600 

CGRGDSFFIR SHFECEKETP QSLAFTRGEV FRWDTLYDG KLGNWLAVRI GNELEKGLIP 660 

NKSRAEQMAS VQNAQRDNAG DRADFWRMRG QRSGVKKNLR KS REDLTAW SVSTKFPAYE 72 0 

RVLLREAGFK RPWLFGPIA DIAMEKLANE LPDWFQTAKT EPKDAGSEKS TGWRLNTVR 78 0 

QVIEQDKHAL LDVTPKAVDL LNYTQWFSIV ISFTPDSRQG VNTMRQRLDP TSNNSSRKLF 84 0 

DHANKLKKTC AHLFTATINL NSANDSWFGS LKDTIQHQQG EAVWVSEGKM EGMDDD PEDR 90 0 

MSYLTAMGAD YLSCDSRLIS DFEDTDGEGG AYTDNELDEP AEEPLVSSIT RSSEPVQHEE 960 

SIRKPSPEPR AQMRRAASSD QLRDNSPPPA FKPEPSKAKT QNKEESYDFS KSYEYKSNPS 102 0 

AVAGNE TPGA STKGYPPPVA AKPTFGRS I L KPSTPIPPQE GEEVGESSEE QDNAPKSVLG 108 0 

KVKIFGEDGS QGPGLQENAG APGSTECKDR NCPEAS 1116 
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Seq ID NO: 32 
Primekey #: 408806 

1 11 21 31 41 51 

I I I I I I 

MPVRGDRGFP PRRELSGWLR APGMEELIWE QYTVTLQKDS KRGFGIAVSG GRDNPHFENG 60 

ETSIVISDVL PGGPADGLLQ ENDRWMVNG TPMEDVLHSF AVQQLRKSGK VAAIWKRPR 12 0 

KVQVAALQAS PPLDQDDRAF EVMDEFDGRS FRSGYSERSR LNSHGGRSRS WEDSPERGRP 18 0 

HERARSRERD LSRDRSRGRS LERGLDQDHA RTRDRSRGRS LERGLDHDFG PSRDRDRDRS 24 0 

RGRSIDQDYE RAYHRAYDPD YERAYSPEYR RGARHDARSR GPRSRSREHP HSRSPSPEPR 3 00 

GRPGPIGVLL MKSRANEEYG LRLGSQIFVK EMTRTGLATK DGNLHEGD I I LKINGTVTEN 3 60 

MSLTDARKLI EKSRGKLQLV VLRDSQQTLI NIPSLNDSDS EIEDISEIES TRSFSPEERR 42 0 

HQYSDYDYHS SSEKLKERPS SREDTPSRLS RMGATPTPFK STGDIAGTW PETNKEPRYQ 48 0 

EEPPAPQPKA APRTFLRPSP EDEAIYGPNT KMVRFKKGDS VGLRLAGGND VGIFVAGIQE 54 0 

GTSAEQEGLQ EGDQILKVNT QDFRGLVRED AVLYLLEIPK GEMVTILAQS RADVYRDILA 60 0 

, CGRGDSFFIR SHFECEKETP QSLAFTRGEV FRWDTLYDG KLGNWLAVRI GNELEKGLIP 660 

NKSRAEQMAS VQNAQRDNAG DRAD FWRMRG QRSGVKKNLR KSREDLTAW SVSTKFPAYE 72 0 

RVLLREAGFK RPWLFGPIA DIAMEKLANE LPDWFQTAKT EPKDAGSEKS TGWRLNTVR 7 80 

QVIEQDKHAL LDVTPKAVDL LNYTQWFPIV IFFNPDSRQG VKTMRQRLMP TSNKSSRKLF 84 0 

DQANKLKKTC AHLFTATINL NSANDSWFGS LKDTIQHQQG EAVWVSEGKM EGMDDDPEDR 90 0 

MS YLTAMGAD YLSCDSRLIS DFEDTDGEGG AYTDNELDEP AEEPLVSSIT RSSEPVQHEE 96 0 

VRRGRPRAGT GEPGVFLALS WTAVCSGCCG RHS 993 



Seq ID NO: 3 3 
Primekey #: 407584 

1 11 21 31 41 51 

I I I I I I 

MMWQKYAGSR RSMPLGARIL FHGVFYAGGF AIVYYLIQKF HSRALYYKLA VEQLQSHPEA 60 

QEALGPPLNI HYLKLIDREN FVDIVDAKLK IPVSGSKSEG LLYVHS SRGG PFQRWHLDEV 12 0 

FLELKDGQQI PVFKLSGENG DEVKKE 146 



Seq ID NO: 3 4 
Primekey #: 450177 

1 11 21 31 41 51 

I I I I I I 

MTWCITTCNF DVDVDLLFQE NSTIGQKIAL SEKIVSVLPR MKCPHQLEPH QIQGMDFIHI 60 

F PWQWLVKR AIETKEEMGD YIRSYSVSQF QKTYSLPEDD DFIKRKEKAI KTWDLSEVY 12 0 

KPRRKYKRHQ GAEELLDEES RIHATLLEYG RRYGFSCQSK MEKAEDKKTA LPAGLSATEK 18 0 

ADAHEEDELR AAEEQRIQSL MTKMTAMANE ESRLTASSVG QIVGLCSAEI KQIVSEYAEK 24 0 

QSELSAEESP EKLGTSQLHR RKVISLNKQI AQKTKHLEEL RASHTS LQAR YNEAKKTLTE 3 00 

LKTYSEKLDK EQAALEKIES KADPSILQNL RALVAMNENL KSQEQEFKAH CREEMTRLQQ 3 60 

EIENLKAERA PRGDEKTLSS GEPPGTLTSA MTHDEDLDRR YNMEKEKLYK IRLLQARRNR 42 0 

EIAILHRKID EVPSRAELIQ YQKRFIELYR QISAVHKETK QFFTLYNTLD DKKVYLEKEI 4 80 

SLLNSIHENF S Q AMAS P AAR DQFLRQMEQI VEGIKQSRMK MEKKKQENKM RRDQLNDQYL 54 0 

ELLEKQRLYF KTVKEFKEEG RKNEMLLSKV KAKAS 575 



Seq ID NO: 35 
Primekey #: 407618 

1 11 21 31 41 51 

I I I I I I 

MAE YL AS I FG TEKDKVNCSF YFKI GACRHG DRCSRLHNKP TFSQTIALLN IYRNPQNSSQ 
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SADGLRCAVS DVEMQEHYDE FFEEVFTEME EKYGEVEEMN VCDNLGDHLV GNVYVKFRRE 120 

EDAEKAVIDL NNRWFNGQPI HAELSPVTDF REACCRQYEM GECTRGGFCN FMHLKPISRE 18 0 

LRRELYGRRR KKHRSRSRSR ERRSRSRDRG RGGGGGGGGG GGGRERDRRR SRDRERSGRF 240 

5 

Seq ID NO: 36 
Primekey #: 43 593 7 

10 1 11 21 31 41 51 

I I I I 1 I 

MSAGSATHPG AGGRRSKWDQ PAPAPLLFLP PAAPGGEVTS SGGSPGGTTA APSGALDAAA 60 

AVAAKINAML MAKGKLKPTQ NASEKLQAPG KGLTSNKSKD DLWAEVEIN DVPLTCRNLL 120 

TRGQTQDEIS RLSGAAVSTR GRFMTTEEKA KVGPGDRPLY LHVQGQTREL VDRAWRIKE 18 0 

15 IITNGWKAA TGTS PTFNGA TVTVYHQPAP IAQLSPAVSQ KPPFQSGMHY VQDKLFVGLE 24 0 

HAVPTFNVKE KVEGPGCSYL QHIQIETGAK VFLRGKGSGC IEPASGREAF EPMYIYISHP 3 00 

KPEGLAAAKK LCENLLQTVH AEYSRFVNQI NTAVPLPGYT QPSAISSVPP QPPYYPSMGY 3 60 

QSGYPWPPP QQPVQPPYGV PSIVPPAVSL APGVLPALPT GVPPVPTQYP ITQVQPPAST 42 0 

GQSPMGGPFI PAAPVKTALP AGPQPQPQPQ PPLPSQPQAQ KRRFTEELPD ERESGLLGYQ 48 0 

20 HGPIHMTNLG TGFSSQNEIE GAGSKPASSS GKERERDRQL MPPPAFPVTG IKTESDERNG 540 

SGTLTGSHGE CDIAGGTGEW LRLV 564 



25 All publications and patent applications cited in this specification are herein 

incorporated by reference as if each individual publication or patent application were 
specifically and individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way 
of illustration and example for clarity and understanding, it will be readily apparent to one 

30 of ordinary skill in the art in light of the teachings of this invention that certain changes and 
modifications may be made thereto without departing from the spirit and scope of the 
appended claims. 

As can be appreciated from the disclosure provided above, the present 
invention has a wide variety of applications. Accordingly, the following examples are 
35 offered for illustration purposes and are not intended to be construed as a limitation on the 
invention in any way. Those of skill in the art will readily recognize a variety of non-critical 
parameters that could be changed or modified to yield essentially similar results. 
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WHAT IS CLAIMED IS: 



1 1 . A method of diagnosing the health status of a biological sample, said 

2 method comprising the steps of: 

3 a) generating a gene expression pattern of the biological sample, and 

4 b) comparing the gene expression pattern of the biological sample with the 

5 reference sets of the Tables 1-6, 

6 wherein a match between the gene expression pattern of the biological sample 

7 and one or more genes of the reference sets provides a diagnosis of the biological sample. 

1 2. The method of claim 1, wherein the biological sample comprises cells 

2 obtained from a biopsy sample. 
1 

2 3. The method of claim 1, the biological sample is diagnosed as healthy 

3 tissue. 

1 4. The method of claim 1, wherein the biological sample is diagnosed as 

2 having the potential to metastasize. 

1 5. The method of claim 1, wherein the diagnosis identifies the tissue as 

2 having metastatic cancer. 

1 7. The method of claim 1, wherein the comparison of the gene expression 

2 pattern of the biological sample and the reference sets is made with reference to at least one 

3 classifier genes from the Tables 1-6. 

1 8. The method of claim 1, wherein the comparison of the gene expression 

2 pattern of the biological sample and the reference sets is made by comparing RNA expression 

3 profiles. 

1 9. The method of claim 1, wherein the comparison of the gene expression 

2 pattern of the biological sample and the reference sets is made by comparing protein 

3 expression profiles. 

1 10. The method of claim 10, wherein the protein expression profile is 

2 evaluated using antibodies. 
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1 1 1 . A method for prognostic evaluation of the metastatic potential of 

2 colorectal cancer comprising the steps of 

3 a) generating a gene expression pattern of a biological sample from the 

4 colorectal cancer, and 

5 b) comparing the gene expression pattern of the biological sample with the 

6 reference sets of the Tables 1-6, 

7 wherein a match between the gene expression pattern of the biological sample 

8 and one or more reference sets provides a prognosis evaluation of the metastatic potential of 

9 the colorectal cancer. 

1 12. The method of claim 12, wherein a match between the gene expression 

2 pattern of the biological sample and the reference set representing colon cancer metastasis or 

3 Duke's stage D colorectal cancer is indicative of poor prognosis. 

1 13. A method for evaluating the progress of a treatment regimen for 

2 metastatic colorectal cancer comprising the steps of: 

3 a) generating a first gene expression pattern of a first biological sample from a 

4 patient, 

5 b) comparing the first gene expression pattern of the first biological sample 

6 with the reference sets of the Tables 1-6, 

7 c) obtaining a match between the first gene expression pattern of the first 

8 biological sample and one or more reference sets of the Tables 1-6, thereby providing an 

9 initial diagnosis of metastatic colorectal cancer, 

10 d) administering to the patient a therapeutically effective amount of a 

1 1 compound that modulates the metastatic colorectal cancer, 

12 e) generating a second gene expression profile of a second biological sample 

1 3 from the patient, 

14 f) comparing the second gene expression pattern of the second biological 

15 sample with the reference sets of the Tables 1-6, 

16 g) obtaining a match between the second gene expression pattern of the second 

17 biological sample and one or more reference sets of the Tables 1-6, 
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18 h) comparing the match between the first gene expression pattern of the first 

19 biological sample and the match between the second gene expression pattern of the second 

20 biological sample, 

21 wherein the comparison indicates the progress of the treatment for metastatic 

22 colorectal cancer. 

1 14. A method for evaluating the efficacy of drug candidates for use in the 

2 treatment of metastatic colorectal cancer comprising the steps of; 

3 a) contacting a cell or tissue culture that has a gene expression profile 

4 indicative of metastatic colorectal cancer with an effective amount of a test compound, 

5 b) generating a gene expression profile of the contacted cell or tissue culture, 

6 c) comparing the gene expression pattern of the contacted cell culture with the 

7 defined sets of genes of the Tables 1-6, 

8 d) obtaining a match between the gene expression pattern of the contacted cell 

9 culture and one or more reference sets of the Tables 1-6, thereby determining the efficacy of 
10 the drug for the treatment of metastatic colorectal cancer. 

1 1 5. A kit for diagnosing the health status of a biological sample said kit 

2 comprising: 

3 a) nucleic acid probes that specifically bind to nucleotide sequences 

4 from reference sets of the Tables 1-6, and 

5 b) means of labeling nucleic acids. 

1 17. The kit of claim 1 5, wherein the nucleic acid probes identify metastatic 

2 cancer derived from a primary tumor in an organ selected from the group consisting of heart, 

3 lung, pancreas, breast, prostate, and colon. 

1 1 8. A kit for diagnosing the health status of a biological sample said kit 

2 comprising: 

3 a) antibodies or ligands that specifically bind to polypeptides encoded 

4 by a genes of the reference sets of the Tables 1-6, and 

5 c) means of labeling the antibodies or ligands that specifically bind to 

6 polypeptides encoded by genes of the reference sets of the Tables 1-6. 
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1 19. The kit of claim 17, wherein the antibodies or ligands identify 

2 metastatic cancer derived from a primary tumor in an organ selected from the group 

3 consisting of heart, lung, pancreas, breast, prostate, and colon. 

1 20. A method for selecting patients for therapy of colon cancer based on 

2 the steps of: 

3 a) generating a gene expression pattern of a biological sample from the 

4 patient, and 

5 b) comparing the gene expression pattern of the biological sample with the 

6 reference sets of the Tables 1-6, 

7 wherein a match between the gene expression pattern of the biological sample 

8 and one or more genes from the reference sets provides an evaluation of the metastatic 

9 potential of the colorectal cancer and thereby determines whether a patient will be selected 
10 for therapy. 
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